Presented by Yehuda Dar

companyscourgeΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

144 εμφανίσεις


Presented by Yehuda Dar

Advanced Topics in Computer Vision (
048921
)




Winter
2011
-
2012

Video Compression Basics



Fundamental
tradeoff

among:


Bit
-
rate


Distortion


Computational complexity

Video Compression Basics



Utilized
redundancies
:


Spatial


Temporal


Psycho
-
visual


Statistical



H.
264
Overview

H.
264
Redundancy Utilization

Means

Utilization

Redundancy



Transform coding



䥮瑲愠c潤楮o
獰 瑩慬t灲敤楣瑩潮t

䡩杨

Spatial

Motion estimation &
compensation

High

Temporal



YCbCr

color space



4
:
2
:

獡浰汩湧



DC
\

AC coefficients quantization

Medium

Psycho
-
visual

Entropy coding

High

Statistical

Compression using Computer Vision

Motivation:



Better utilization of the psycho
-
visual redundancy



Application
-
specific compression methods



Exploring new approaches

A Review of:

A Scheme for
Attentional

Video
Compression


R. Gupta and S.
Chaundhury

PAMI
2011

Method Outline



Salient region detection


Foveated

video coding


Integration into H.
264





Foveated

image coding demonstration

Figure from
Guo

& Zhang, Trans. Image Process.,
2010

Saliency Map

Step
1
: Creating a
3
D Feature Map

Based on

Calculation method

Feature type

Liu et al,

CVPR
2007

Color spatial
variance

Global

Huang et al,

ICPR
2010

Center
-
surround
multi
-
scale ratio of
dissimilarity

Local

Yu et al,

ICDL
2009

Pulse
-
DCT

Rarity

Relevance Vector Machine (RVM)



Used here as a
binary classifier



Advantages over support
-
vector
-
machine (SVM):


Provides posterior probabilities


Better generalization ability


Faster decisions

Saliency Map

Step
2
: Unify Features using RVM

Global

local

rarity

average

average

average

ground truth

count
pixels

‘salient’
\


‘non salient’

RVM

sample

label

Training Procedure for MBs:

Saliency Map

Step
2
: Unify Features using RVM

Trained RVM Usage:

RVM

New

input

Binary label

‘salient’
\


‘non salient’

Probability

Relative
saliency

Saliency Map: Result Comparison

input

global

local

[Huang et al,
ICPR
2010
]

rarity

[Yu et al,

ICDL
2009
]

proposed

[
Harel

et al,
NIPS
2006
]

[Bruce &
Tsotsos
,
NIPS
2006
]

Figures from Gupta &
Chaundhury
, PAMI
2011

Saliency Map: ROC Curve

Figure from Gupta &
Chaundhury
, PAMI
2011

Proposed

[
Harel

et al, NIPS
2006
]

Integration Into H.
264
:

Calculation of Saliency Values


Recalculating saliency map only when it significantly changes



Mutual
-
information between successive frames indicates
changes in saliency:

Figures from Gupta &
Chaundhury
, PAMI
2011

Integration Into H.
264
:

Propagation of Saliency Values


For inter
-
coded MBs, the saliency value is a
weighted
-
average

of those pointed by the
motion
-
vector

Figures from

Gupta &
Chaundhury
, PAMI
2011

Integration Into H.
264
:

Salient
-
Adaptive Quantization



Non
-
uniform bit
-
allocation


Smaller saliency value => coarser quantization

Integration Into H.
264

Figure from Gupta &
Chaundhury
, PAMI
2011

Paper Evaluation


Novelty:


Methods for:


saliency map


saliency value propagation



Assumption:


All the MBs in P
-
frames are inter
-
coded (problematic)




Writing level:


Good


Partially self
-
contained


Paper Evaluation


Feasibility
:



Higher complexity than H.
264
encoders



Not for real
-
time encoders



Useful at low bit
-
rates



Objects entering the scene may be considered unimportant



Experimental evaluation:


Saliency:


visual comparison: good


ROC curve comparison: partial


Compression:


None (authors’ future direction)


Future Directions



Improving encoding complexity


less complex saliency method



Better object entrance treatment


Using mutual
-
information of frame areas



Treat intra
-
coded MBs in P
-
frames

A Review of:

3
D Models Coding and Morphing
for Efficient Video Compression

F.
Galpin
, R.
Balter
, L. Morin, K.
Deguchi

CVPR
2004


Method Outline



3
D model extraction


3
D model
-
based video coding


Reconstruction using adaptive geometric morphing

3
D Models Stream Generation

Figure from
Galpin

et al, CVPR
2004

Stream Compression



Three data types to compress:


3
D model


Texture images


Camera parameters

Texture Image Compression

Figure from
Galpin

et al, CVPR
2004

Reconstruction Process:

3
D Model Compression



The
3
D model originates in
decimated depth map



Compressed by:


Wavelet transform


Depth
-
adaptive quantization

Figures from
Galpin

et al, CVPR
2004

Video Reconstruction:

Texture Fading

Figure from
Galpin

et al, CVPR
2004

Video Reconstruction:

Texture Fading

without texture fading

with texture fading

Figures from
Galpin

et al, CVPR
2004

Video Reconstruction:

Geometric Morphing


Improving
3
D model
interpolation

Figure from
Galpin

et al, CVPR
2004

Video Reconstruction:

Geometric Morphing

regular interpolation

interpolation with geometric morphing

Figures from

Galpin

et al, CVPR
2004

Result Comparison with H.
264

Paper Evaluation


Novelty:


Compression using unknown
3
D model



Assumptions:


Static scene


Moving monocular camera


Neglected camera rotation


GOP intrinsic parameters are fixed




Writing level:


Good


Not self
-
contained



Paper Evaluation



Feasibility
:


Only for static scene video


High encoder
\
decoder complexity


Real
-
time unsuitable


Useful at very low bit
-
rates



Experimental evaluation:


Sufficient visual comparison with H.
264


No run
-
time information

Future Directions



Treat moving objects



Improve complexity


At least for real
-
time decoding

Approach Comparison

3
D model

Attention

Static scene

Any

Video type

Very low

Low

Bit
-
rates

useful at

High

High

Encoder

complexity

High

Regular

Decoder complexity

Unsuitable

Possible

Integration in H.
264

Inferior

Promising

Overall evaluation