Visual information Systems:

brasscoffeeΤεχνίτη Νοημοσύνη και Ρομποτική

17 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

68 εμφανίσεις

Prof. D. Petkovic, Prof. R. Jain

1

Visual information Systems:
Lessons for its Future

Prof. D. Petkovic, SFSU

Prof. R. Jain, UC Irvine

Dpetkovic@cs.sfsu.edu

SPIE January 2005, San Jose

Prof. D. Petkovic, Prof. R. Jain

2

Goals


Look at past and present of visual information systems
from the standpoint of us, computer science researchers in
content based retrieval, CV, AI, and multimedia


Analyze progress and status


Identify future opportunities and challenges in making
content based retrieval and our work become part of
successful applications


Discuss role of CV, AI and content based retrieval
researchers for future development


Intended to be critical and self
-
critical, and to call for action
and changes in the way we do the work

Assumption: ultimately, research has to influence real world
applications

Prof. D. Petkovic, Prof. R. Jain

3

What are visual information
systems?


Prof. D. Petkovic, Prof. R. Jain

4

Image

Video


Information



retrieval, query,

browsing, visualization

Are visual information systems only this?

(content based retrieval, CV and AI
-

centric view)

5

Image

Video


Information



retrieval, query,

browsing, visualization, delivery


of
all

the information

Metadata

Related

Data

Links to

related

info

Location


WWW

info

No, they are all this!

Time

Measure
-

ments

Audio

Integration

Prof. D. Petkovic, Prof. R. Jain

6

What do users do with visual
information systems


Search or browse for images/video of their
current

interest
and then review/playback/process the results?


But most often: search or browse for information and
knowledge where image/video is
but one aspect

of it


Entertain


Learn


Explore


Investigate/experiment/evaluate


Communicate


Teach, train


Manage personal data

Prof. D. Petkovic, Prof. R. Jain

7

Examples of commercial or near
-
commercial visual info systems


Recording and sharing of personal visual data


My Life Bits (Microsoft Bay Area Research)


http://www.research.microsoft.com/barc/MediaPresence/MyLi
feBits.aspx


Internet Photo Albums


http://photos.yahoo.com/ph//my_photos


Scientific research and education


Astronomy: SkyServer (Microsoft Bay Area Research)


http://cas.sdss.org/dr3/en/


Bioinformatics


http://hedgehog.sfsu.edu/home/index.aspx


Cell video


Prof. D. Petkovic, Prof. R. Jain

8

Examples (2)


News


http://news.yahoo.com/


Entertainment


http://movies.yahoo.com/


Visual info sharing on the WWW


http://video.search.yahoo.com/


Art


Getty Museum




http://www.getty.edu/art/


Hermitage


http://www.hermitagemuseum.org/fcgi
-
bin/db2www/qbicSearch.mac/qbic?selLang=English



Prof. D. Petkovic, Prof. R. Jain

9

Examples (3)


Remote sensing and surveillance


http://www.landsat.org/



Training and education


http://www.employeeuniversity.com/corporatevideotraining/index.htm


http://coursestream.sfsu.edu/



Biometrics


Face recognition and matching


Fingerprints


Iris



Prof. D. Petkovic, Prof. R. Jain

10

Search vs. browse or manually
prepared material


It is not always search/query over indexed collections
(whether they are manually or automatically indexed)


Very often (entrainment, on
-
line learning) the primary
function is browsing from a limited list of well organized
and manually prepared material.



Carefully prepare once


show/sell many times
” paradigm
justifies investment and need for manual expert preparation


Movie trailers are works of art with the purpose of marketing and
sales


high level of expert manual prep is required and will likely
stay that way


Currently, market of “Carefully prepare once


show/sell
many times” is much larger


Search is most often based on current (changing) interests

Prof. D. Petkovic, Prof. R. Jain

11

Some history and perspective…


Early nineties (BI
-

before Internet):

excitement of early
discovery and (over)promises



Mid to late Nineties:

explosion of Internet, things are hot!
Promise of ubiquitously available data. Furious work to
achieve goals (research and startup community). WWW
media emerging. MPEG7 started



2000 and beyond:

“Crash” of Internet R&D (I.e. it became
ubiquitous). Promises of content based retrieval
still
unfulfilled



visual info systems applications doing well
(media is integral part of WWW applications) but with
little use of content based retrieval techniques




Prof. D. Petkovic, Prof. R. Jain

12

Content based Retrieval


the
early dream


It was immediately identified that the process of indexing
(I.e. attaching searchable metadata) of image and video is a
big problem


Idea of content based retrieval:


Process images and videos to automatically extract searchable
indices. Heavy use of AI, CV, PR was to be applied


Indexes to be used for search and retrieval by “similarity” (“show
me image like this”)


Content based retrieval was ultimately supposed to make
indexing and searching of vast image/video databases
automated and economical and reduce or even eliminate
need for text metadata


Great excitement among research community and some
potential customers, and many excellent pieces of work


13

QBIC history


Excitement among researchers at IBM Almaden Research. First
prototype very exciting, generated a lot of related work in research
community. Many good papers and patents (QBIC and others)


Excitement among early customers in art and stock imaging/video


Marketing first skeptical but then started to oversell (e.g. you do not
need text metadata any more)


we needed to get involved to tone it
down


Transferred into IBM Digital Library and DB2


Business (real $) did not happen:


It was hard to estimate QBIC added value


QBIC search was limited


Too early (this was before or early Internet times)


But QBIC did bring good marketing and attention to multimedia
features of IBM DB2 and DL. It was used successfully as a marketing
tool


http://www.hermitagemuseum.org/fcgi
-
bin/db2www/qbicSearch.mac/qbic?selLang=English



IBM QBIC group lost some credibility in IBM product divisions


QBIC grew into CueVideo (video + audio indexing and search)

Prof. D. Petkovic, Prof. R. Jain

14

Status today: successful visual
information systems application


Prof. D. Petkovic, Prof. R. Jain

15

Characteristics of successful
commercial systems today


Content consist of images/video
but also of variety of
critically important related data (text, audio, prices, links,
measurements etc.)
arranged in easy to use GUI


Indexing and data organization done predominantly
manually with predefined and simple metadata structures
and ontology


Metadata schemas defined by domain professionals, not
computer scientists. Most are very simple. MPEG 7 not
widely used


Search is very simple: title, author, and sometimes a few
keywords against manually entered data


Browsing: by alphabet, time, price, using video key, image
thumbnails, often from manually prepared collections


Content based indexing and retrieval not used



Prof. D. Petkovic, Prof. R. Jain

16

How is indexing of images/video
done today


Manually entered metadata, usually from a fixed list/structure


Defined metadata structure into which the content providers can
publish the content (many standards exist). Most used
standards are relatively simple (e.g. really Simple Syndication


RSS
http://blogs.law.harvard.edu/tech/rss


WWW: crawlers analyze image

context

: where on the WWW
page the image is, ALT tags, use of the associated text linked to
image etc.


Use of manually generated close captions for video indexing


Only very rudimentary content based analysis: image type,
dimensions, whether the image is color or B&W, photographic or
clip art etc.


Even basic content based retrieval (color histograms,
composition) practically not used


17

Content based Retrieval


why it
is not enough


Assume content based retrieval worked perfectly. What
could it ultimately do?


Image color and spatial composition


Recognition/matching of some major objects (people, buildings)


Motion, action recognition


Full speech to text


Even this ideal situation is not enough! We also need:


Other info about the image/video (when, who, where, what, related
scientific measurements…)


Who, where, when and why


Related data and links to related data etc.


Integration and synchronization with other sources of data across
semantics, time, location, cause/effect dimensions

…..
And much more,
none of it recorded in pixels

Prof. D. Petkovic, Prof. R. Jain

18

What next?


Prof. D. Petkovic, Prof. R. Jain

19

Future opportunities and challenges


some ideas


Improve process of media annotation and indexing
(automated and semi automated)


Define visual ontology, applications specific then more
general


Leverage and improve speech recognition, general and
domains specific


Integrate variety of data (media and related data) and
provide unified multimedia modeling and handling


Incorporate time and location search into the mainstream

Prof. D. Petkovic, Prof. R. Jain

20

Improve process of media annotation
and indexing




Automate metadata that lend themselves to automation. Leverage
semiautomated means, but pay utmost attention to HCI


Compute indexes based on
all related data and clues

(WWW links,
tags, audio, GPS etc.)


Allow multimedia annotation to help: annotate text, outline
image/video objects using pointing, add links…


Use power of internet community to enable economical media
annotations


e.g. ESP annotation game by CMU
www.espgame.org


Improve usability to enable annotation at most opportune time and
make it very easy to use (during capture, in free/fun time etc)


Leverage speech (audio tags and speech recognition)


Pay attention to ease of use and GUI


Use time and location


Image and video can be the data but also an index to libraries

Prof. D. Petkovic, Prof. R. Jain

21

Define visual ontology, general or
applications specific



Define ontology of visual media: structure, terms etc. as
well as related extraction procedures


Not clear if general ontology is practical


work on
domains specific ones first, then try to generalize


Make it simple and work with domain experts


Offer procedures for automated and semiautomatic
instantiation of ontologies using all available info


Much work already done
outside

of CS community (e.g.
domains specific standards for data submissions)

Prof. D. Petkovic, Prof. R. Jain

22

Links to some metadata standards (most are
XML based and developed outside of CS)


Dublin Core


http://dublincore.org/documents/2003/04/02/dc
-
xml
-
guidelines/



MPEG7 ISO standard for Video (class 8)


http://www.mpeg.org/MPEG/starting
-
points.html#mpeg7



METS for Digital Libraries


http://www.loc.gov/standards/mets/



AIIM Standards (Enterprise Content Management)


http://xml.coverpages.org/AIMM
-
Images200104.html


http://xml.coverpages.org/umnImages.html


http://digital.lib.umn.edu/elements.html



Really Simple Syndication (RSS) to be used by Yahoo videos search


http://blogs.law.harvard.edu/tech/rss





Prof. D. Petkovic, Prof. R. Jain

23

Leverage and improve speech
recognition, general and domains
specific



Speech and audio has a wealth of information: semantics
and timing. It is easily available and natural and effective
for input


Speech recognition engines are today trained on general
English, with no specific names and domains specific
terms. Problem:
terms most often searched for (names,
specific domain terms,) which are not in speech engine


develop domain specific speech engines


Leverage speech and audio as annotation medium


Push speech and audio annotation into capture devices


Synchronize, cross
-
index speech with related textual data
for indexing and increased accuracy

Prof. D. Petkovic, Prof. R. Jain

24

Integrate variety of data (media and related
data) and provide unified multimedia
modeling and handling




Visual information is based on visual media AND related
information (links, text, documents, measurements, slides
etc).


enable
integrated

indexing, data organization,
search and browse


Leverage time and location in indexing and search


Create unified multimedia data models with unified
storage, indexing and query, across semantics, time,
location


Integrate across variety of data types both semantically, at
GUI level, and at the system level (e.g. cross index video
with slides and text info)


From data to information: old “chasm” still exists


work
on it, first by solving some concrete applications


25

But also…..


Work
very

closely with users and domain experts. Develop
real and complete applications


Take a broader usage, system and application view (v.s
looking for application of AI, CV and content based
retrieval)


Collaborate with DB, HCI and Internet systems researchers


Leverage
all

sources of information, not only image and
video


Perform extensive experimental evaluations and participate
in formal benchmarks (see e.g. NIST TRAC competition
rules)


Contribute and participate in standards activities


Pay
much

more attention to GUI and HCI and perform
more formal and complete user evaluations

Not doing this risks making us irrelevant…


Prof. D. Petkovic, Prof. R. Jain

26

Acknowledgement


We thank J. Gray, J. Gemmell (Bay Area Microsoft
Research), R. Singh (SFSU), B. Horoowitz (Yahoo), A.
Amir (IBM Almaden Research) for comments and
feedback


Prof. D. Petkovic, Prof. R. Jain

27

Prof. D. Petkovic, Prof. R. Jain

28

Some history and perspective…


Late eighties and early nineties: excitement of
early discovery and (over)promises


CPU, networks and storage started to enable reasonably good
manipulation, rendering and processing of images


Multimedia appears as a field


DB people are being courted by CV/AI people to broaden their
views and include multimedia data


First projects on content based retrieval: e.g. QBIC (IBM),
PhotoBook (MIT)…


Startup activity: Virage and others


Interest from CV and AI researchers


First joint conferences with DB and CV communities


Many (over) promises that caught the eye of investors, marketers
etc.

29

Some history and perspective…


Mid to late Nineties: explosion of Internet, things
are hot! Furious work to achieve goals


Internet enabled better communication and gradually made images,
then video feasible to manipulate, send and view. Explosive
growth


Advances in compression, networking, CPU, media formats,
standards and storage helped greatly. Cheap capture devices
starting to happen


Multimedia moves and melds slowly into Internet


DB vendors start to embrace multimedia types (blobs, extenders,
blades, cartridges)


Content base retrieval becomes a very popular research topics for
CV and AI. Many conferences and workshop organized


MPEG7 activity started


Availability of research and venture funding continues


First trials, first products with content based retrieval (Virage,
IBM, Informix…)

Prof. D. Petkovic, Prof. R. Jain

30

Some history and perspective…


2000 and beyond: wake up, crash of Internet (I.e.
it became ubiquitous). Promises of content based
retrieval still unfulfilled


Internet became common thing (which is good) but lost its research
appeal


it became a “vehicle”. It is still growing rapidly with
more and more visual data


Explosion of image and video on internet as well as cheap capture
devices (e.g. phones capturing audio+image+video+text+GPS)


Further advances in networking, CPU, storage made image and
video ubiquitously available and affordable


Startups based on content based retrieval not doing well or folded


Strong research activity


Most applications resolved by researchers outside of CS


Minimal or no use of content based retrieval in commercial world