Metadata: first principles

closebunkieΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 4 χρόνια και 1 μήνα)

87 εμφανίσεις

Metadata: first principles

Pat Bell

Knowledge, Analysis and Intelligence

Definition

“Metadata is data about data

structured

information
about a resource”


Instances of metadata


resource:

book


metadata:

catalogue record

Instances of metadata


resource:

record


metadata:

corporate file plan

Instances of metadata


resource:

person


metadata:

directory entry

Instances of metadata


resource:

web page


metadata:



0
20
iso-8859-1
ir
7
0

S
earch

… (Right click on web page) …


… (Select view source) …

Instances of metadata


resource:

web page


metadata:

tags

<!DOCTYPE HTML PUBLIC "
-
//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC
-
html401
-
19991224/loose.dtd">

<!
--

InstanceBeginEditable name="doctitle"
--
>

<title>HM Revenue &amp; Customs: Child Benefit &amp; Guardian's Allowance</title>


<!
--

InstanceBeginEditable name="Metadata"
--
>


<meta name="title" lang="eng" content="" />


<meta name="description" lang="eng" content="" />


<meta name="keywords" lang="eng" content="" />


<meta name="eGMS.subject.category" lang="eng" scheme=“IPSV" content="Tax, Benefits" />


<meta name="DCTERMS.audience" lang="eng" content="all" />


<meta name="DC.creator" lang="eng" content="HM Revenue and Customs" />


<meta name="DC.date.issued" scheme=" W3CDTF" content="2006
-
03
-
24" />


<meta name="DC.date.modified" scheme= "W3CDTF" content="" />


<meta name="eGMS.disposal.review" scheme=" W3CDTF" content="2006/04/01" />


<meta name="DC.identifier" scheme="URI" content="" />


<meta name="DC.format" lang="eng" content="text/html"/>


<meta name="DC.language" scheme="ISO639
-
2/T" content="eng" />


<meta name="DC.publisher" lang="eng" content="HM Revenue and Customs" />


<meta name="eGMS.rights.copyright" lang="eng" content="HM Revenue and Customs" />


<!
--

InstanceEndEditable
--
>

1st principle


one resource, one description

The resource

The metadata

Title:

Mona Lisa



Title: Mona Lisa

Creator: Da Vinci



Creator: Bell

Relation: (Very distant)


Relation:

Uses of metadata


today

Resource
discovery

Resource

administration

Technical support

search

authentication

navigation

disposal

version control

filtering

Intellectual
property rights

preservation

Uses of metadata


tomorrow: the semantic web

“An extension of the web … that will bring
structure to the meaningful content of Web
pages, creating an environment where
software agents roaming from page to
page can carry out sophisticated tasks for
users”

Tim Berners
-
Lee et al, Scientific American 17 May
2001

Uses of metadata


building blocks for the semantic web


Metadata …


… expressed using the Resource Description
Framework (RDF) …


… in standardised XML (eXtensible Markup
Language) documents.


Find out more at the World Wide Web
Consortium (W3C): www.w3.org/

Components of metadata


statement

<meta name="title" lang="eng" content="" />

<meta name="description" lang="eng" content="" />

<meta name="keywords" lang="eng" content="" />

<meta name="eGMS.subject.category" lang="eng" scheme=“IPSV" content="Tax, Benefits" />

<meta name="DCTERMS.audience" lang="eng" content="all" />

<meta name="DC.creator" lang="eng" content="HM Revenue and Customs" />

<meta name="DC.date.issued" scheme=" W3CDTF" content="2006
-
03
-
24" />

<meta name="DC.date.modified" scheme= "W3CDTF" content="" />

<meta name="eGMS.disposal.review" scheme=" W3CDTF" content="2006/04/01" />

<meta name="DC.identifier" scheme="URI" content="" />

<meta name="DC.format" lang="eng" content="text/html"/>

<meta name="DC.language" scheme="ISO639
-
2/T" content="eng" />

<meta name="DC.publisher" lang="eng" content="HM Revenue and Customs" />

<meta name="eGMS.rights.copyright" lang="eng" content="HM Revenue and Customs" />

Components of metadata


elements

<!DOCTYPE HTML PUBLIC "
-
//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC
-
html401
-
19991224/loose.dtd">

<html lang="en"><!
--

InstanceBegin template="/Templates/blank.dwt" codeOutsideHTMLIsLocked="false"
--
>

<head>

<!
--

InstanceBeginEditable name="doctitle"
--
>

<title>HM Revenue &amp; Customs: Child Benefit &amp; Guardian's Allowance</title>

<!
--

InstanceEndEditable
--
>

<meta http
-
equiv="pics
-
label" content='(pics
-
1.1 "http://www.icra.org/ratingsv02.html" l gen true for
"http://www.hmrc.gov.uk" r (nz 1 vz 1 lz 1 oz 1 cz 1) gen true for "http://www.hmce.gov.uk" r (nz 1 vz 1 lz 1 oz 1 cz 1)
gen true for "http://www.ir.gov.uk" r (nz 1 vz 1 lz 1 oz 1 cz 1) gen true for "http://customs.hmrc.gov.uk" r (nz 1 vz 1 lz 1
oz 1 cz 1))'>


<!
--

InstanceBeginEditable name="Metadata"
--
>


<meta name="title" lang="eng" content="" />


<meta name="description" lang="eng" content="" />


<meta name="keywords" lang="eng" content="" />


<meta name="eGMS.subject.category" lang="eng" scheme="GCL" content="Tax, Benefits" />


<meta name="DCTERMS.audience" lang="eng" content="all" />


<meta name="DC.creator" lang="eng" content="HM Revenue and Customs" />


<meta name="DC.date.issued" scheme=" W3CDTF" content="2006
-
03
-
24" />


<meta name="DC.date.modified" scheme= "W3CDTF" content="" />


<meta name="eGMS.disposal.review" scheme=" W3CDTF" content="2006/04/01" />


<meta name="DC.identifier" scheme="URI" content="" />


<meta name="DC.format" lang="eng" content="text/html"/>


<meta name="DC.language" scheme="ISO639
-
2/T" content="eng" />


<meta name="DC.publisher" lang="eng" content="HM Revenue and Customs" />


<meta name="eGMS.rights.copyright" lang="eng" content="HM Revenue and Customs" />


<!
--

InstanceEndEditable
--
>

<meta name="
title
" lang="eng" content="" />

<meta name="
description
" lang="eng" content="" />

<meta name="
keywords
" lang="eng" content="" />

<meta name="eGMS.
subject
.category" lang="eng" scheme=“IPSV" content="Tax, Benefits" />

<meta name="DCTERMS.
audience
" lang="eng" content="all" />

<meta name="DC.
creator
" lang="eng" content="HM Revenue and Customs" />

<meta name="DC
.
date
.issued" scheme=" W3CDTF" content="2006
-
03
-
24" />

<meta name="DC.
date
.modified" scheme= "W3CDTF" content="" />

<meta name="eGMS.
disposal
.review" scheme=" W3CDTF" content="2006/04/01" />

Components of metadata


refinements (Qualifiers)

<meta name="title" lang="eng" content="" />

<meta name="description" lang="eng" content="" />

<meta name="keywords" lang="eng" content="" />

<meta name="eGMS.subject.
category
" lang="eng" scheme=“IPSV" content="Tax, Benefits" />

<meta name="DCTERMS.audience" lang="eng" content="all" />

<meta name="DC.creator" lang="eng" content="HM Revenue and Customs" />

<meta name="DC.date.
issued
" scheme=" W3CDTF" content="2006
-
03
-
24" />

<meta name="DC.date.
modified
" scheme= "W3CDTF" content="" />

<meta name="eGMS.disposal.
review
" scheme=" W3CDTF" content="2006/04/01" />

2
nd

principle


dumb
-
down

A valid value for a refinement must also be valid for
the unrefined element


date issued

(2007
-
07
-
25) is fine for
date


date updating frequency

(monthly) is not

Components of metadata


encoding schemes

<meta name="title" lang="eng" content="" />

<meta name="description" lang="eng" content="" />

<meta name="keywords" lang="eng" content="" />

<meta name="eGMS.subject.category" lang="eng"
scheme=“IPSV"

content="Tax, Benefits" />

<meta name="DCTERMS.audience" lang="eng" content="all" />

<meta name="DC.creator" lang="eng" content="HM Revenue and Customs" />

<meta name="DC.date.issued"
scheme=" W3CDTF"

content="2006
-
03
-
24" />

<meta name="DC.date.modified" scheme= "W3CDTF" content="" />

<meta name="eGMS.disposal.review"
scheme=" W3CDTF"

content="2006/04/01" />

Components of metadata


encoding schemes

Two sorts:


Controlled vocabulary (Pick list)



eg Library of Congress Subject Headings


Syntax (Prescribed format)



eg Date format yyyy
-
mm
-
dd


(and you can have free text tags, like

Title)


Components of metadata


values

<meta name="title" lang="eng"
content=""

/>

<meta name="description" lang="eng"
content=""

/>

<meta name="keywords" lang="eng"
content=""

/>

<meta name="eGMS.subject.category" lang="eng" scheme="GCL"
content="Tax, Benefits"

/>

<meta name="DCTERMS.audience" lang="eng"
content="all"

/>

<meta name="DC.creator" lang="eng"
content="HM Revenue and Customs"

/>

<meta name="DC.date.issued" scheme=" W3CDTF"
content="2006
-
03
-
24"

/>

<meta name="DC.date.modified" scheme= "W3CDTF"
content=""

/>

<meta name="eGMS.disposal.review" scheme=" W3CDTF"
content="2006/04/01"

/>

3
rd

principle


appropriate values


Develop policies to support local requirements


But keep in mind wider needs


The metadata can be used by people as well as
machines

Summary


Metadata is structured resource description


A very abstract name for more concrete activities


For resource discovery and administration, and
technical support


A building block of the semantic web


Three principles: one to one, dumb
-
down and
appropriate values


Statements break down into elements,
refinements, encoding schemes and values


The role of the information professional


Not

tagging huge numbers of resources for
someone else


Part of implementing a system (website,
EDRM…)


Part of managing the system


Expert and guardian of standards


Guidance to the people who do the tagging