# Understanding the Semantics of Media

Τεχνίτη Νοημοσύνη και Ρομποτική

24 Νοε 2013 (πριν από 4 χρόνια και 5 μήνες)

70 εμφανίσεις

Understanding

the Semantics of Media

Lecture Notes
on Video Search & Mining,
Spring 2012

Presented by Jun
Hee

Yoo

Biointelligence

Laboratory

School of Computer Science and Engineering

Seoul National
Univertisy

http://bi.snu.ac.kr

Semantic Understanding

There are some tools which attempt to segment video at
a higher level.

But this level of analysis does not tell us much about
the meaning represented in the media.

Problem Statement

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

2

Approach

Segmentation Literature

Use LSI because it allow us to
quantify
the position of a
portion of the document in a multi
-
dimensional semantic
space
.

Propose to summarize the text with LSI and analyze the
signal with smooth Gaussians.

Semantic Retrieval Literature

Use mixtures of probability experts for semantic
-
audio
retrieval (MPESAR) to model which more sophisticated
model connecting words and media.

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

3

Analysis Tools

SVD

To reduce the dimensionality of a signal in a manner
which is optimum, in a least
-
squared sense.

This use to reduce dimensionality of both audio and
image video data.

Color Space

𝐵𝑖
=
64𝑓𝑙𝑟
log
2
𝑅
+
8𝑓𝑙𝑟
log
2
𝐺
+
𝑓𝑙𝑟
(
log
2
𝐵
)

Concatenate into 512 histogram bins.

Word Space

Using Latent semantic indexing with SVD.

To measure the distance use the angle;

cos
𝜙
=
(
𝜈
1

𝜈
2
)
/
(
𝜈
1
𝜈
2
)

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

4

Segmenting Video

Temporal Properties of Video

Color:

It provides robust evidence for a shot change in a
video signal.

However, it cannot tell us global structure of the video.

Random words form a transcript:

The words indicate a lot about the overall structure of
the story.

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

5

Segmenting Video

Test Material

CNN Headline News (30min TV show).

21
st

Century Jet (Documentary).

Use automatic speech recognition(ASR) to provide a
transcript of the audio.

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

6

Segmenting Video

Scale Space

Convert the original signal into scaled space.

In scale space, we analyze a signal with many
different kernels.

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

7

With Low Pass Filter

Histogram

Segmenting Video

Combined Image and Audio Data

Combined color, words and scale space analysis. The
result is a 20
-
dimensional vector function of time and
scale.

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

8

Segmenting Video

Hierarchical Segmentation Results

Color and word autocorrelations for the Boeing 777
video

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

9

Segmenting Video

Hierarchical Segmentation
Results

Grouping 4
-
8 sentences produces a larger semantic
autocorrelation.

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

10

Segmenting Video

Intermediate Results

A scale
-
space
segmentation algorithm
produced a boundary
map showing the edges
in the signal.

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

11

Segmenting Video

A comparison of ground truth.

Left: estimated result.

Right: ground truth.

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

12

Segmenting Video

Shot Boundary Segmentation.

Use commercial product, designed by
YesVideo
.

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

13

Segmenting Video

Manual Segmentation result

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

14

Semantic Retrieval

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

15

MPESAR

process

Semantic Retrieval

Acoustic Signal processing chain

Acoustic to Semantic Lookup

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

16

Semantic Retrieval

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

17

Testing

Retrieval Results

© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

18

Histogram of true label ranks
based on likelihoods from
audio
-
to
-
semantic tests

Histogram of true label ranks
based on likelihoods from
semantic
-
to
-
acoustic tests