IBE312: Information Architecture 2013

hurriedtinkleΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

63 εμφανίσεις

IBE312:
Information

Architecture

2013

Ch
. 9


Metadata



Many

of

the

slides in
this

slideset

are

reproduced

and/or
modified

content

from
publically

available

slidesets

by
Paul Jacobs (2012),

The
iSchool
, University of Maryland

http://terpconnect.umd.edu/~psjacobs/s12/INFM700s12.htm.

These materials were made available and

licensed under a Creative
Commons Attribution
-
Noncommercial
-
Share Alike 3.0 United States

See
http://creativecommons.org/licenses/by
-
nc
-
sa/3.0/us/

for details.


2

Metadata


“Data about data”
-

Definitional and descriptive
documentation/information about data…


From Free On
-
line Dictionary of Computing:


Data about data. In data processing, meta
-
data is definitional data that provides
information about or documentation of other data managed within an application or
environment.


For example, meta
-
data would document
data about data elements
or attributes,
(name, size, data type, etc) and
data about records
or data structures (length, fields,
columns, etc) and
data about data
(where it is located, how it is associated, ownership,
etc.). Meta
-
data may include
descriptive information
about the context, quality and
condition, or characteristics of the data.


(
Some other

definitions
.)

Metadata


Why do we need this?


Types of metadata


Descriptive/subjective/content (e.g. author, subject, keywords, …)


Administrative (e.g. owner, rights, cost, creation date, version, …)


Technical (e.g. format, size, dependencies, programs)


. . . .


In practical terms:


Metadata helps
users

locate, navigate, interpret content


Metadata helps
organizations

manage content


Metadata helps
systems

manipulate content

Data without Metadata…

7/1/1988
OL
950
20.3
13
0.8
-0.1
33.1
27.8
5.3
5.92
7/2/1988
OL
950
24.2
12.6
1
-0.1
27.8
23.9
3.8
4.56
7/3/1988
OL
.
.
.
.
.
.
.
.
.
7/4/1988
OL
950
0.4
16.3
0.4
0.2
41
34.5
6.5
15.5
7/5/1988
OL
1005
32.9
18.9
1.4
0.3
29.8
23.7
6.1
14.23
7/6/1988
OL
1020
32.3
20.5
1.4
0.3
23.4
18.9
4.5
12.97
7/7/1988
OL
1015
36.8
24.9
1.7
0.5
18.6
15.3
3.2
13.92
7/8/1988
OL
925
42.8
25.6
2.5
0.6
23.7
19.9
3.9
15.18
7/9/1988
OL
945
23.3
27.8
0.7
0.8
27.7
23.5
4.3
12.33
7/10/1988
OL
1030
49.8
26.2
2.6
0.6
40.3
34
6.3
22.14
7/11/1988
OL
940
44.8
25.2
2.5
0.8
34
29.2
4.8
16.76
7/12/1988
OL
1010
47.6
26.9
2.6
0.7
47.3
39.6
7.7
16.13
7/13/1988
OL
945
36.5
22.6
1.9
0.6
36.7
32.6
4
15.5
7/14/1988
OL
950
19.5
18.6
0.4
0.5
302
39.1
262.9
11.07
7/15/1988
OL
955
31.7
15.7
1.5
0.4
29.7
25
4.7
9.49
7/16/1988
OL
955
23.3
14.5
1.8
0.8
23.4
20.7
2.7
8.14
7/17/1988
OL
1015
23.8
16.6
1.6
0.6
27.7
24.1
3.7
9.17
7/18/1988
OL
934
32.9
16.7
2.1
0.7
34
28.9
5.1
9.49
7/19/1988
OL
1010
29.2
20.4
1.9
0.7
26
22.3
3.7
10.44
7/20/1988
OL
952
44.8
24.8
2.1
0.8
31.7
27.5
4.2
10.75
7/21/1988
OL
1029
33.7
37.1
1.9
0.6
34.5
30.1
4.3
12.02
7/22/1988
OL
1017
34.3
32.9
2
0.7
31.4
26.2
5.1
12.65
7/23/1988
OL
1040
35.7
24.6
2
0.8
23.7
20.4
3.3
15.5
7/24/1988
OL
923
47.6
28.9
2.9
0.8
67.3
58.9
8.4
20.87
7/25/1988
OL
1030
58.3
32.6
2.9
0.7
68
59.3
8.7
22.14
7/26/1988
OL
950
49.3
29.2
3.4
0.6
86
75.1
10.9
21.19
7/27/1988
OL
1006
54.1
20.9
3.9
0.6
94
82.8
11.2
25.06
7/28/1988
OL
1010
40.5
16.5
1.7
0.3
41
34.4
6.6
6.54
7/29/1988
OL
1000
25.5
23.6
1.4
0.1
41
35.4
5.6
3.82
7/30/1988
OL
1005
47.9
17.6
0.8
0.1
18.3
15.9
2.3
4.19
7/31/1988
OL
1015
38
22.5
1.5
0.1
30
25.3
4.7
4.44
8/1/1988
OL
1018
21.2
8.8
1.1
-0.1
24.7
21.1
3.6
4.81
8/2/1988
OL
1004
38.5
22.8
2.1
0.3
54
46.8
7.2
9.8
8/3/1988
OL
1011
94
32.6
2.1
0.3
45.5
38.9
6.6
9.49
8/4/1988
OL
955
58.3
43.1
2.5
1.1
41
33.1
7.9
9.8
8/5/1988
OL
951
55.8
42.2
2.1
0.8
38
31
7
8.86
Who:


authored it?


to contact about data?


What:


are contents of database?


When:


was it collected?


processed? finalized?



Where:


was the study done?


Why:


was the data collected?


How:


were data collected?


processed? Verified?


… can be pretty useless!


Early Example of Metadata

Menagerie of Terms


Classification


Hierarchies


Epistemology


Directories


Controlled vocabularies


Knowledge representation

Let’s focus on significant differences.

Let’s focus on advantages/disadvantages.

Let’s focus on how each is useful
.

7

Controlled Vocabulary


Any defined subset of natural language


List of
equivalent terms
(synonym rings)


Use search logs.


List of
preferred terms
(authority files)


Commonly also include variant terms


Educating users, enabling browsing


Term rotation (pointers in index)
p.201


Classification scheme / taxonomy


Hierarchical relationships
(narrower/broader)

Controlled

Vocabulary

Queries

can

be ”
exploded
” to
increase

recall

Controlled

Vocabulary

authority

file


inclusive
,
preferred

term
can

serve as
the

unique

identifier

for a
collection

of

terms,
educate

users


Related Terms & Techniques


Taxonomies


Anything organized in some sort of hierarchical structure


Tagging



Adding almost any kind of metadata to content, but now
often descriptive and user
-
provided


Thesauri


Focus on relations between terms


Focus on “concepts”


Ontologies


Usually model a specific domain or part of the world


Generally machine
-
readable

Increasing
complexity and
richness

Metadata


Taxonomies


& Thesauri


Practical Uses

How are taxonomies, tagging, controlled vocabularies
and thesauri used?


The semantic gap:
What’s the problem
?


Synonymy


roughly, different words or phrases can be
used to express similar ideas (e.g. “notebook”, “laptop”)


Polysemy



roughly, the same word can have different
meanings (e.g., “line” (fishing, code, queue, . . .) )



Taxonomies

try to group similar concepts


“Tags”
often assign words to concepts, making it easier
to find related concepts


Controlled vocabularies
avoid ambiguity (like a specific
tag set)


Thesauri

represent attempts to better organize
mappings between words and concepts

Do these present precision or recall problems?

Taxonomies


Organization of objects according to some
principle


Familiar examples:


Linnaean taxonomy (for living organisms)


Web directories (e.g., Yahoo or ODP)


Corporate directories


Organization charts


Organizational structures previously discussed

Metadata


Taxonomies


& Thesauri


Practical Uses

Tagging
-

e.g.
Flickr



popular tags

Metadata


Taxonomies


& Thesauri


Practical Uses

Flickr


related tags

Metadata


Taxonomies


& Thesauri


Practical Uses

Del.icio.us


related tags

Metadata


Taxonomies


& Thesauri


Practical Uses

Thesauri: Motivation


“Semantic gap” between concepts and words








Online
thesauri

help mapping many synonyms or word
variants onto one preferred term


improve precision in
retrieval (p.203)


Words are used to evoke concepts


Concrete objects:
MacBook

Pro,
iPhone


Abstract ideas: freedom, peace

Concepts

Words

Ideas

Meaning

17

Thesauri


Book of synonyms, often including related and
contrasting words and antonyms.


In this class:


A controlled vocabulary in which equivalence,
hierarchical, and associative relationships are
identified for purposes of improved retrieval.


Technical lingo …


Thesauri standards: ISO 2788, …


18

Thesauri Types



IA Uses of Thesauri


For organization


For navigation


For indexing content


For searching

Applying IA Principles


Focus on users and user needs


users are
different, and have different models


Focus on content


concepts are different, too


different levels, words, complexity, vagueness


Examples:


What’s the difference between laptop, PDA, phone,
and convergence device?


When is “cancer research” “oncology”?


When a user browses a furniture catalog for chairs, do
you show them ottomans and footstools?



Standard Thesaurus Structure

Computer

Notebook

Laptop

Desktop

Replacement

Ultraportable

Tablet PC

IS
-
A

IS
-
A

AKA

Synonyms (variants)

Narrower

Terms

Broader

Terms

Preferred

Semantic

relationships

in a
thesaurus


(
pp
. 204
-
205):
Abbreviations
: PT, VT, BT, NT, RT,
Use

(U)


VT
use

PT,
Use

For (UF)


full list
of

VT
on

the

PT
record
,
Scope

Note (SN)


meaning

of

the

term to
rule

out

ambiguity
.

Semantic

relationships

of

a
wine

thesaurus
, p. 206


Some Real Examples


Content tagging and social media (e.g.
flickr
,
del.i.cious
)


Special
-
purpose classification schemes and
thesauri (e.g. art & architecture thesaurus


AAT, UMLS)


General semantic tools and classification
schemes (e.g., Princeton
WordNet
, Roget’s
Thesaurus)

Art & Architecture Thesaurus

Metadata


Taxonomies


& Thesauri


Practical Uses

http://www.getty.edu/research/conducting_research/vocabularies/aat/


UMLS (Unified Medical Labeling System)

Source: National Library of Medicine (NIH)

Metathesaurus


Semantic


Network


SPECIALIST

Lexicon +Tools

135 broad
categories

and

54
relationships
between them

1 million+

biomedical
concepts

from
over 100 sources

lexical
information and
programs for
language
processing

3 Knowledge Sources

used separately or together


Metadata


Taxonomies


& Thesauri


Practical Uses

E.g. UMLS (Unified Medical Labeling System)

Source: National Library of Medicine (NIH)

Metadata


Taxonomies


& Thesauri


Practical Uses

Began in 1986 as long
-
term R&D project



Designed for systems developers


Develop multi
-
purpose tools to enhance
understanding of medical meaning
across
systems


Overcome barriers to effective retrieval
of machine
-
readable information


Overcome variety of ways the same
concepts are expressed in machine
readable and human language

UMLS Uses

Source: National Library of Medicine (NIH)

Metadata


Taxonomies


& Thesauri


Practical Uses


Information retrieval


Thesaurus construction


Natural language processing


Automated indexing


Electronic health records (EHR)



Distribution mechanism for


HIPAA, CHI, PHIN regulatory standards


SNOMED CT

UMLS
Metathesaurus

http://www.nlm.nih.gov/research/umls/


UMLS
Metathesaurus

http://www.nlm.nih.gov/research/umls/


UMLS Thesaurus Browser

http://www.nlm.nih.gov/research/umls/


32

Semantic Relationships


Equivalence

(PT = VT)


Hierarchical:
Generic (Bird NT Magpie), whole
-
part (Foot NT big
toe) or instance (Seas NT Mediterranean Sea)


Faceted / multiple hierarchies


Associative


Related terms (hammer RT nail)



Preferred terms
:


Form, selection, definition and specificity


Polyhierarchy

(Medline
corss
-
lists viral pneumonia under both ...Fig
9
-
25, p. 220)


Faceted classification


multiple taxonomies that focus on different
dimensions of the content. (e.g. wine.com pp. 223
-
224.)

Associative

Term


Poly
-
Hierarchies


Concepts can have multiple parents


Example:









What are the advantages and disadvantages?


What’s the relationship to
polysemy
?


Cracow (Poland :
Voivodship
)

Auschwitz II
-
Birkenau
(
Poland

: Death Camp)

Block 25

(Auschwitz II
-
Birkenau)

German death
camps

Kanada

(Auschwitz II
-
Birkenau)

From Shoah Foundation’s thesaurus of holocaust terms

Faceted Hierarchies


Alternative to single and poly
-
hierarchies


Basic idea:


Describe objects along multiple facets


Each facet has its associated hierarchy


Issues:


What’s a facet?


How do you navigate faceted hierarchies?

Faceted Browsing Example

Faceted Browsing Example

Demo:

http://flamenco.berkeley.edu/demos.html

Advantages of Facets


Integrates searching and browsing


Easy to build complex queries


Easy to narrow, broaden, shift focus


Helps users avoid getting lost


Helps to prevent “categorization wars”

Relationship to IA?

Database

Web

Server

Application

Server

Network

Ontologies are implicitly “hidden” here!!!

Flight

Trip

From:

Part
-
of

Airplane

Equipment

To:

Departure Time:

Arrival Time:

Origin:

Destination:

Type:

Capacity:

Rule:
Arrival Time is always after Departure Time

Rule:
Distance from Origin to Destination typical > 100 miles

Putting it all together…

Database

Web

Server

Application

Server

Network

Database

Web

Server

Network

Two
-
Layer Architecture

Three
-
Layer Architecture

Apache

mySQL

PHP

Popular Implementation

Content

Metadata

Presentation

SQL Database

PHP/HTML

Content


Presentation

A

B

C

D

E

F

G

H

You are here: A > C > D

Contents at D

Related


-

D


-

E

Hierarchy(child, parent)

Content(id, attribute
1
, attribute
2
, attribute
3
, …)

Faceted Browsing

Matching

Results

Filter by


-

Facet
1


(possible values)



-

Facet
2


(possible values)

Hierarchy(child, parent)

Content(id, attribute
1
, attribute
2
, attribute
3
, …)

Summary


Meta
-
data


General function


Types of meta
-
data


Taxonomies and Thesauri


Role in organizing, navigating and searching
content


General
-
purpose taxonomies


Special
-
purpose taxonomies


Practical use & implementation