COMPLEXITY REDUCTION FOR HEVC INTRAFRAME LUMA MODE DECISION USING IMAGE STATISTICS AND NEURAL NETWORKS.

madbrainedmudlickΤεχνίτη Νοημοσύνη και Ρομποτική

20 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

102 εμφανίσεις

COMPLEXITY REDUCTION

FOR HEVC
INTRAF
RAME
LUMA
MODE DECISION
USING IMAGE STATISTI
CS AND
NEURAL NETWORKS
.


DILIP PRASANNA KUMAR

1000786997














UNDER GUIDANCE OF DR
. RAO

U
NIVERSITY OF
T
EXAS AT
A
RLINGTON
.

D
EPT
.

OF
E
LECTRICAL
E
NGINEERING


1.

INTRODUCTION


HEVC or the High Efficiency Video Coding standard is a new standard being developed as a
joint project by ITU
-
T VCEG and ISO/IEC MPEG, working together in a partnership known
as
the Joint Collaborative Team on Video Coding (JCT
-
VC) [1]. It has been designed to addres
s all
existing applications of H.264/MPEG
-
4 AVC and to particularly focus on two key issues: increased
video resolution and increased use of parallel processing arc
hitecture [2].

HEVC aims t
o achieve 50% compression over H
.264 while maintaining a similar visual quality
[2]. The compression is achieved at the cost of increased encoder complexity.
It supports visually
lossless compression and lossless compression, pict
ures from QVGA (320x240) to ultraHD (8k x
4k). It allows random access, fast channel switching, “trick modes” and
also intra only coding modes.


HEVC employs a traditional hybrid coding model employing temporal and spatial prediction
s
,
spatial transforms,
quantization, entropy coding and in loop filtering. The block diagram for a typical
HEVC encoder is shown in figure 1.


Figure
1
: HEVC encoder block diagram. [1]


Figure
2
: HEVC decoder block diagram.
[12]

HEVC encoder is functionally similar to the older video coding standards such as
H
.264 and
AVC. The notable differences are the new coding tree unit instead of the macroblock, new in loop
filtering techniques and a single entropy coding method
-

Contex
t Adaptive Binary Arithmetic Coding

(CABAC). However, modifications have been made to almost all aspects of the encoder and these
differences contribute to the increased compression achieved by HEVC [1].

2.

CODING TREE UNIT IN
HEVC


The significant change is

the new coding

tree unit instead of a macroblock.
In principle, the
quadtree coding structure

in HEVC

is described by

means of “blocks” and “units”. A block defines
an array of

samples and sizes thereof whereas a unit encapsulates one

luma and correspondi
ng
chroma blocks together with syntax

needed to code these. Consequently, a

Coding Tree Unit

(
CTU
)

includes coding

tree blocks (CTB) and syntax specifying coding data and

further subdivision. This
subdivision results in coding unit

(CU) leaves with coding

blocks (CB). Each CU incorporates

more
entities for the purpose of prediction, so called prediction

units (PU), and of transform, so called
transform units (TU).

Similarly, each CB is split into prediction blocks (PB) and

transform blocks
(TB)


[3]. Figur
e 2 shows
the coding units for a frame from the sequence Traffic.



Figure
3
: Detail of 4kx2k sequence Traffic showing the coding block (white) and nested transform block (red)
structure resulting from recursive quadtree structure
. [3].


The coding units are traversed in a Z
-
scan order as shown in figure 4
. The sizes of the luma CTB
can be chosen as 16x16, 32x32 or 64x64 samples [2]. The larger sizes typically enable better
compression [4].


Figure
4
:
HEVC Z
-
scan order for traversing the coding units. Figure adapted from the documentation accompanying
the HM 8.0 source code

[1][2]
.







3.

INTRA PREDICTION IN
HEVC

FOR LUMA SAMPLES

HEVC introduces 33 directional modes and a planar and a DC mode for intra prediction for luma
samples

[Fig 5]
. The angles are intentionally designed to provide denser coverage for near
-
horizontal
and near
-
vertical angles and coarser coverage for near
-
diago
nal angles [1]. The angles are
± 0, ± 2, ±
5, ± 9, ± 17, ± 21, ± 26 and ± 32 in degrees, from the horizontal and vertical directions.
In addition
to this, HEVC supports two alternative prediction methods, planar and DC, to target regions which
have strong
directional edges.




Figure
5
: Luma modes for intra prediction in HEVC. Mode 0 is planar mode and Mode 1 is DC mode. [1]

The prediction process of the Intra_Angular modes can involve extrapolating samples
from the projected refe
rence sample location according to a given directionality. To
remove sample
-
by
-
sample switching between the reference row and column buffers, all
extrapolations in a PB refer to a single reference row or column depending on the mode
number

[1]. The refere
nce samples used for intra prediction are sometimes filtered by a 3
tap smoothing filter

[1 2 1]/4 smoothing filter, in a manner similar to what was used for
8×8 intra prediction in H.264/MPEG
-
4 AVC. However, HEVC applies this smoothing
operation more adap
tively according to the directionality and the block size. As in
H.264/MPEG
-
4 AVC

[2]
, the smoothing filter is not applied for 4×4 blocks. For 8×8
blocks
, only the diagonal directions

2, 18, or 34, use the reference sample smoothing. For
16×16 blocks, the
reference samples are filtered for most directions except the near
-
horizontal and near
-
vertical directions,
(directions

in the range of 9 to 11 and 25 to 27
)
.
For 32×32 blocks, all directions except the exactly
-
horizontal and exactly
-
vertical
directions us
e the smoothing filter.

The Intra
Planar mode also uses the smoothing filter when the block size is equal or
greater than 8×8, and the smoothing is not

used (or useful) for the Intra
DC case.

HEVC
applies this smoothing operation adaptively according to
the directionality and block size.
[2].


Figure
6
: Reference samples used with mode 29. [1]
















4.

PROBLEM STATEMNT

Intra prediction in HEVC is incredibly complex because of the large number of possible modes.
This is
further complicated by the quad tree structure: the encoder must decide both the quad tree
depth and the prediction direction for each coding unit.

A brute force search for the best intra mode
will consist of evaluating 735 modes per LCU
1
.


Figure
7
: The recursive coding tree search employed by the HM encoder. [2].

Evaluating each mode consists of generating a prediction, subtracting the prediction from the
original image and obtaining the residual. This is then used to compute th
e cost for rate distortion
optimization.

The reference software HM8.0 employs this brute force method, and it has been shown that in
the all intra
mode, a quarter of the total time is spent on rate
-
distortion optimization and an
additional 16% for intra p
rediction [3].

The problem
being considered is
to

evaluate a method to

reduce the number of modes that need
to be
evaluated for each coding unit and generate data regarding its performance. It is hoped that
similar attempts by other researchers will gener
ate sufficient data so that the best method can
eventually be determined by comparing such results of all proposed methods.











1

35 modes + ( 4 * (35 modes

+ (4 * 35 modes) ) = 735 modes


5.

HYPOTHESIS

The hypothesis is that the optimum coding unit size, and hence the quad tree depth, can be
determined as a
function of the variance of the pixel values

within a coding unit

and the correlation
to the reference pixels.
Also
, the statistics of the most frequently used tree depth around the
neighboring CUs can be helpful in determining the order in which the possi
ble modes can be tested.

Further, once the coding tree depth has been determined by analyzing the statistics of the image,
the intra mode direction within each prediction unit can be viewed as classification based on image
features. Thus an artificial neu
ral network classifier can be trained to identify the most probable intra
mode direction.
[13]


6.

PROPOSD WORK

To verify the hyp
othesis, the following work is

proposed:

1.

Obtain a large set of sample pixels and corresponding intra mode decisions by modifying
the
HM 8.0 software

[1][2],

to log the relevant data and then compressing the test sequences
using the modified encoder.

2.

Develop a method to interpret the logged data to determine the image statistics. Use this to
obtain the statistics for the data samples
.

3.

Search for a relation between the image statistics and the code tree depth. Specifically,
determine if the variance within an image, and the correlation to the reference pixels can be
used to obtain the quad tree depth.

4.

Develop a
MATLAB

code to determine

the quad tree depth using the developed method
using variance and correlation and use it to determine quad tree depths with reference
samples. Compare the results with the results of brute force search employed by HM 8.0

5.

Train a neural network to identify

the most possible intra mode direction. Collect the
training data required from the modified HM 8.0 encoder.

6.

Develop a
MATLAB

code to use the neural network to identify the intra mode for reference
samples. Compare these with the results of HM 8.0 encoder.

7.

Study the implementation complexity and verify

that

the proposed method is less complex
than the brute force method. Also, c
ompare the results with other co
mplexity reduction
techniques [6][7][8
].

8.

Implement the above model in HM 8.0 and measure the gain in encoding time. Also,
measure the increase in
BD
-
bitrate and
BD
-
PSNR

[15][16][17]
. Compare the results with the
results achi
eved by other techniques

[6][7][8]
.

9.

Use the data to conclude if the proposed solution is a feasible solution to the complexity of
the HEVC encoder. Determine the limitations and possible improvements to the proposed
solution.

REFERENCES

1.

B. Bross, W. J. Ha
n, J. R Ohm and T Wiegand,
“High efficiency video coding (HEVC) text
specification draft 8”, ITU
-
T/ISO/IEC Joint Collaborative Team on Video Coding (JCT
-
VC) document JCTVC
-
J1003, July 2012

2.

G. J. Sullivan, J.
-
R. Ohm, W.
-
J. Han, and T. Wiegand, "Overview of
the
h
igh
e
fficiency
video c
oding (HEVC) Standard," IEEE Transactions on Circuits and Systems for Video
Technology,

vol 22, pp. ,

December 2012.

3.

F. Bossen, B. Bross, K. Sühring, and D. Flynn, "HEVC Complexity and

Implementation
a
nalysis," IEEE Transactions

on Circuits and Systems for Video Technology,

vol

22

, pp. ,

December 2012.

4.

G. J. Sullivan and R.L Baker “Efficient quadtree coding of images and video”, IEEE
Transactions on
Image Processing, vol 3. No 3.
p
p
.

327
-
31, Jan 1994.

5.

Y. T
an, C. Yeo, H. Tan, and

Z. Li, “
On re
sidual quad
-
tree coding in HEVC
" in IEEE
International Workshop on Multimedia Signal Processing (MMSP),
pp. 1


4,

October 2011.

6.

S.
-
W. Ten
g, H.
-
M. Hang, and Y.
-
F. Chen,

Fast mode decision algorithm for residual
quadtree codi
ng in HEVC
" in Th
e Visual Communications and Image Processing (VCIP)
Conference,

pp
. 1
-
4
,

November 2011.

7.

K. Choi and E. Jang, “
Fast coding unit decision method based on coding tree

pruning for
high eff
ciency video coding,"

Opt.

Eng.0001;

51(3)

:

030502

1

030502

3. doi:

10.1117/

1.OE.51.3.030502

, March 2012.

8.

L. Zhao,

L. Zhang, S. Ma, and D. Zhao, “
Fast mode decision algorithm for intra

prediction
in HEVC
" in The Visual Communications and Image Processing (VCIP)

Conference,

pp. 1
-

4,

November 2011.

9.

W. Jiang, H. Ma, and Y
. Chen, “
Gradient based fast mode decision algorithm for

intra
prediction in HEVC
," in International Conference on Consumer Electronics,

Communications and Networks (CECNet),
pp. 1836
-

1840,
April 2012.

10.

X. Shen, L. Yu, and J. Chen,

Fast coding unit size
selection for hevc based on

B
ayesian
decision rule
" in Picture Coding Symposium (PCS),

pp. 453


456,

May 2012.

11.

G. Tian and S. Goto,

Content based hierarchical

fast coding unit decision algo
rithm for
HEVC
" in Picture Coding Symposium (PCS),

vol 1, pp. 56
-
59,

May 2012.

12.

C.
Fogg, “Suggested figures for the HEVC specification”, ITU
-
T/ISO/IEC Joint
Collaborative Team on Video Coding (JCT
-
VC) document JCTVC
-

J0292r1, July 2012.

13.

C. H Lampert, “Machine Learning for Video Compression: Macroblock Mode Decision”,
i
n 18
th

International Conference on Pattern Recognition, 2006 (ICPR 2006). Vol
1, pp. 936
-
940.

14.

Dr. Gary J. Sullivan, “HEVC: The Next Generation in Video Compression”, Keynote
speech at VCIP 2012, Nov 29th 2012.

15.

G. Bjontegaard, “Calculation of average PSNR
differences between RD
-
Curves”, ITU
-
T
SG16, Doc. VCEG
-
M33, 13th VCEG meeting, Austin, TX, April 2001.

16.

G. Bjontegaard, “Improvements of the BD
-
PSNR model”, ITU
-
T SG16 Q.6, Doc. VCEG
-
AI11, Berlin, Germany, July 2008.

17.

K. Anderson, R. Sjobetg and A. Norkin, “B
D measurements based on MOS”, (online), ITU
-
T Q6/SG16, document VCEG
-
AL23, Geneva, Switzerland, July 2009.

Available at
http://wfpt3.itu.int/av
-
arch/video
-
site/0906_LG/VCEG
-
AL23.
zip

18.

J. Lainema et al, “Intra coding of the HEVC standard”, IEEE Trans. Circuits and Systems
for Video Technology, vol 22, pp. , Dec 2012

19.

G. Correa et al, “Performance and computational complexity assessment of high efficiency
video encoders”, IEEE Trans. C
ircuits and Systems for Video Technology, vol 22, pp. ,Dec
2012.

20.

M. Zhou et al, “HEVC lossless coding and improvements”, IEEE Trans. Circuits and
Systems for Video Technology, vol. 22, pp. , Dec 2012.