VIDEO RETRIEVAL AND USER INTERACTION

moancapableΤεχνίτη Νοημοσύνη και Ρομποτική

17 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

66 εμφανίσεις

VIDEO RETRIEVAL AND USER INTERACTION
AND DIGITAL RIGHTS MANAGEMENT

From Multimedia Retrieval, Springer,
Blanken

et al.

“Multimodal” is the keyword…


Based on a case study


Formula race cars video recordings


Fusion of multimodal information


Sound


Audio signal analysis to detect interesting events


when the
commentator gets excited


At the beginning of an event, there is an overview by commentator


They capture the audio signal and screen out the non
-
voice range
signal


They also look for specific words


not general voice recognition,
but searching only for a handful of race
-
specific words




F
usion


Audio


Analysis of image stream


To catch start of race and other events


Used to locate time boundaries of isolatable events


Superimposed text


Projected on
tv

screen


Information on the driver


Driver’s place in race, etc.



Audio processing


Mix of human language, car noise, background noise,
crowd cheering, horns


Look for human voice frequency


Short time energy (STE)


To remove noise


Wave form based


Pitch


fundamental frequency (F0), the higher, the more
excitement in the voice


Search for phonemes


Pause rate


to detect quantity of speech


Keyword spotting


less semantics, but lower error rate


Image stream


Searched for places where commentator raised his voice


Searched histogram, looking for certain colors and
shapes


Tracked the changing of colors and shapes over a series
of frames


Focus on


Start of race


Passing


Fly
-
outs (sand and dust)

Text


Two classes


Scene text


Superimposed text


The same text can span many frames, and so they count
on its position being fixed to limit processing time

Interaction


Ways to pose queries


Ways to give feedback


Ways to explore

Interaction types


Retrieval


Query formulation


Concept based


Content based


Concept
-
based


Key words in natural language


People use different words for the same thing


Metadata is often missing


Easy for user, hard for software


Content
-
based


Query by example paradigm


User provides examples


Dynamic query interaction


Sliders, buttons, etc.


Visual is the key


Of the query


Of the results


Example system, page 299


Interaction cycle is short

Browsing


Links, with a feeling similar to using the web


Browsing model


To get impression of search space


To find something when you aren’t sure what it is


Browsing a collection of objects and browsing a single object


Example on page 301

User input and relevance feedback


Modalities


Visual, audio, tactile


No user guide needed


If it is speech only, it is difficult to process


Use of ambient intelligence to collect information


Relevance feedback


Binary feedback


Weighed relevance feedback


Personalization


Similar to 1
-
to
-
1 marketing concept


User profiles are used


Users not excited about providing profile info, though


Users are grouped into content interest groups

Presentation


Must provide metadata and data in an integrated way


Inherently multimedia in nature


Tree maps or complex metadata or data


Graphs to put multimedia objects together into single
conceptual objects


Starfield

display


Breaking videos into segments to aid non
-
linear searching


Images on pages 314 and 315 and 316


Key factors in presenting multimedia data


What capabilities the device has


Limits of device


like size, color, formats of data


Must often change formats of data to fit a device

Digital rights


DRM (digital rights management)


Preventative approach


encryption


Reactive approach



Tracking behavior and looking for a violation


Sometimes called forensic tracking


Looking for specific watermarks, often specific to a given user


Makes it hard to pass content on


Application domains


Legal


concept: Personal Entertainment Domain (PED)


To keep content secure, commercially and intelligence
-
wise


Diagram on page 325
and 326 and 331