Multimedia Compression

uglyveinInternet and Web Development

Jun 24, 2012 (5 years and 4 months ago)

472 views

Multimedia
Compression

B90901134
陳威尹

Why Compress


Raw data are huge.


Audio:

CD quality music

44.1kHz*16bit*2 channel=1.4Mbps


Video:

near
-
DVD quality true color animation

640px*480px*30fps*24bit=220Mbps


Impractical in storage and bandwidth

Outline


Generic Compression Overview


Content specific Compression


Lossy Compression

Introduction to
Generic Compression
Algorithm

Lossless Compression

Generic Compression


Also called Entropy Encoding


Lossless Compression Algorithms


Entropy can defined as:


Need statistical knowledge of data


Well
-
known Algorithms:


Rice coding


Huffman coding


Arithmetic coding

Huffman encoding

Input: ABACDEAACCAABEAABACBDDABCADDBCEAEAAADBE

Order
-
0 model

Symbol A B C D E

Count 15 7 6 6 5

total:39*3=117 bits

Output:

15*1+(7+6+6+5)*3=87 bits

Compression ratio:


117/87 = 1.34

Property of Huffman encoding


Easy to implement, high encoding speed


Unique Prefix Property
: no code is a
prefix to any other code


Adaptive Huffman encoding:


statistical knowledge not available


update Huffman tree when needed

Arithmetic Encoding


Symbol X, Y

prob(X) = 2/3

prob(Y) = 1/3


Property of Arithmetic Encoding


Prevent entropy wasting in Huffman
coding, for the number of bits to represent
a symbol can be non
-
integer


About 5~10% smaller than Huffman
coding


Computational intensive


US patented!!


Both Huffman and Arithmetic are used in
the entropy encoding stage in JPEG

Application of General
Compression


Generic file compression like Zip, Rar,
gzip, bzip, etc.


Final stage of content specific
compression


JPEG uses Huffman or Arithmetic


Monkey’s Audio (ape) uses Rice


Lossless Audio (La) uses Arithmetic

Content specific
Compression

Further De
-
correlation

De
-
correlation


Correlation means redundancy


However, general algorithm may not find
content
-
specific correlation


General algorithm of higher order may not be
efficient enough


No matter lossy or lossless, multimedia file
format use content
-
specific pre
-
filter as 1
st

step
to reduce data redundancy.

Correlation in Multimedia


Audio:


Temporal, Channel


Still Image:


Color space, Spatial, Stereo


Video:


Temporal

Audio Channel Correlation


Correlation between
L/R channels


L/R to mid/pass band
conversion


More complex
decorrelation in more
channels

Color Space Correlation


Correlation between
color channels


map RGB to YUV
color space

Y = 0.299*R + 0.587*G + 0.114*B

U =
-
0.169*R
-

0.331*G + 0.500*B + 128.0

V = 0.500*R
-

0.419*G
-

0.081*B + 128.0



Example in PNG



Color Space Correlation

--

RGB to YUV Conversion

Y 97KB

U 32KB

V 37KB

R 95KB

G 96KB

B 98KB

Video Channel Correlation


Multi
-
view channel in
3D video


convert to Image and
Depth channel


Disparity Estimation
(like Motion Estimation)


Video Temporal Correlation


Similarity between
adjacent frames


Motion estimation and
motion compensation
(mostly Lossy)

Motion Vector

Search Range

Current Frame

Reference Frame




Lossless is not enough!


The best lossless audio and image
compression ratio is normally a half


Lossy audio compression like mp3 or ogg
achieve 1/20 ratio while remain acceptable
quality, and 1/5 ratio for impeccable quality


Lossy video compression reduce a film to
1/300 size

Lossy Compression

Loss of data lead to higher
compression ratio


Lossy Compression


Massively reduce information we don’t
notice


Highly content specific


Psychology

Lossy Audio Compression


Frequency domain


Quantization


The importance varies in bands


Higher frequency, larger quantum


Psychoacoustics


Pitch resolution of ear is only 2Hz without
beating


Threshold of hearing varies in bands


Simultaneous and temporal masking effect

Lossy Image Compression


Frequency domain


Discrete Cosine Transform (in Jpeg)


Discrete Wavelet Transform (in J2k)


Quantization


Reduce less important data

Transform

Quantization

Entropy

Coding

Image

data

Output

data

Transform

Quantization

Entropy

Coding

JPEG

J2K

DCT

Discrete

Cosine

Transform

DWT

Discrete

Wavelet

Transform

8x8

Quantization

Table

Quantization

for each

sub
-
band

Huffman

Coding

Arithmetic

Coding

Jpeg2000 vs. Jpeg

Lossy Image Compression in
Practice (1)


Original

Lossy Image Compression in
Practice (2)


Transform
domain
coefficients.


Only a few
components
are visible for
each 8x8 block.


The DC
component is
in the upper
left of each
block

Lossy Image Compression in
Practice (3)


After
quantization
and IDCT.


Note clearly
seen blocky
effect.


Compression
ratio = 17.8:1
with an SNR
of 20.1 dB,
not including
entropy
encoding

Lossy Video Compression

Motion Estimation

Motion Compensation

Without motion compensation

With motion compensation

Frame Type


Intra Frame (I)


Predictive Frame (P)


Bidirectional predictive Frame (B)

Video Compression Demo


Motion Vector and bandwidth overlaid on
mpeg4 video using
ffdshow
-
20041012

Reference


Lossless Compression Algorithms

http://www.cs.cf.ac.uk/Dave/Multimedia/node207.html


Monkey’s Audio

http://www.monkeysaudio.com/theory.html


Lossless Audio (La)

http://www.lossless
-
audio.com/theory.htm


Compression and speed of lossless audio formats

http://web.inter.nl.net/users/hvdh/lossless/main.htm

http://members.home.nl/w.speek/comparison.htm


http://www.wordiq.com/definition/Wavelet_compression


http://www.wordiq.com/definition/Psychoacoustics


http://www.wordiq.com/definition/MP3


H.264

http://www.komatsu
-
trilink.jp/device/pdf11/UBV2003.pdf