# Content

Internet και Εφαρμογές Web

5 Δεκ 2013 (πριν από 4 χρόνια και 6 μήνες)

94 εμφανίσεις

Digital Libraries

Models and Content

Goals for tonight

Finish up from last week

the 5 S model more formally

Status of the systems available

Obtaining, describing, indexing content

XML

Dublin Core

Introducing content exchanges (OAI)

Applying
the 5S
model, informally

Choose a subject area

Stream
-

what types of data? gif, jpg,
avi
,
docx
,
pdf
, html?

Structure
-

How are the elements organized? Is there a
hierarchy? Are there multiple structures?

Spaces
-

How will we index the items? How will we divide them
into related groups

Scenarios
-

what services will we provide? What information do
we need to provide those services? What events might happen
that we need to plan for?

Societies
-

who is the library intended to serve? Remember to
include agents and other processes as well as users.

This is the first deliverable for your first project
.

More formally: Definitions

Definition: A
stream

is a sequence whose
codomain

is a non empty set.

Definition: A

structure

is a
tuple

(
G, L, F
)
where
G

= (
V,E
) is a directed graph with vertex
set
V

and edge set
E, L

is a set of label values,
and
F

is a labeling function.
F : (V

E ) → L
.

See
http://www.mathsisfun.com/sets/domain
-
range
-
codomain.html

for
a nice description of domain, range, codomain if you need it.

Structure illustration

Images

Audio files

Books

Collection

includes

A very simple structure. How might it be enhanced? How would an
index be included? What substructures might be added?

What are the G, L
, F
, V, E
parts of this example?

Definitions, cont’d

Definition: A
space

is a measurable space, measure
space, probability space, vector space, topological
space, or metric space

A vector space is a representation for the set of elements
in a collection. The vector representing each element is a
set of characteristics held by that element and both
connecting that element to others that are similar and
distinguishing it from those that are different.

We will do an exercise to illustrate

Vector space illustration

Consider a car. What are the characteristics that
you associate with a car?

If you want to compare one car to another, what
characteristics would you choose?

If you wanted to distinguish a car from another type
of vehicle, what characteristics would you need?

distinguish from a snowmobile

distinguish from a truck

Make a vector of those characteristics.

Then, fill in the vector for several specific cars.

Definitions
-

3

Definition: A
scenario
is a sequence of related
transition events (e
1
, e
2
, …, e
n
) on state set S such
that
e
k

= (
s
k
,
s
k+1
,) for 1 <=
k

<=
n
.

More easily visualized, a scenario is a path in a directed
graph, G = (S, ∑
e
), where vertices correspond to states
in the state set S and directed edges are equivalent to
events in a set of events, ∑
e
, and correspond to
transitions between states.

Scenarios must be implemented to make a working
system.

Definitions
-

4

Definition: A
society
is a
tuple

(C,R) where

C = (c
1
, c
2
, …,
c
n
) is a set of conceptual communities, each
community referring to a set of individuals of the same
class or type (e.g. actors, activities, components,
hardware, software, data)
;

R = (r
1
, r
2
, …,
r
m
) is a set of relationships, each relationship
being a
tuple

r
j

= (
e
j
,
i
j
) where
e
j

is a Cartesian product c
k
1

x

c
k
2

x

x

c
k
n
j
. 1<= k
1

< k
2
< … <
k
n
j
<=
n
, which specifies the
communities involved in the relationship and
i
j

is an activity.

Projects in our DL laboratory

Mendel 289 is the center of activity for projects
related to digital libraries and similar projects.

Summary of the projects under way, which may
present opportunities for class projects or for
independent study

NSDL, CITIDEL, CSTA, Ensemble, Distributed
Expertise, Computing Ontology, Interdisciplinary
Computing and its relationship to the libraries ….

Our systems

Now available

Fedora
linux

machines, remotely accessible (use the gateway)

Bare machines with just basic system

We can install
Drupal

either from the
Drupal

site (doing things
for ourselves) or from the
Bitnami

site (builds the stack for us)

I just heard that
Drupal

may already be installed. Feel free to
uninstall and reinstall if you wish.

If you have a computer of your own and want to use it,

Fine, but you must be able to demonstrate it to the class at the
end of the semester. I will need to be able to see what you are
doing from time to time during the semester.

That means you

The Digital Library Content

Essential elements for a digital library

Users

Content

Services

Content
-

requirements

Obtain

Store

Organize

Describe

Find

Deliver

Describing the content

How to describe content

What description

elements

Dublin Core (
http://dublincore.org/
)

International standard

“a standard for cross
-
domain information resource
description.”

15 descriptive elements

IEEE
-
LOM

Information about a resource, encoded in
the resource or associated with the
resource.

eXtensible

Markup Language

XML

XML is a markup language

XML describes features

There is no standard XML

Use XML to create a resource type

Separately develop software to interact
with the data described by the XML
codes.

Source: tutorial at w3school.com

XML rules

Easy rules, but very strict

First line is the version and character
set used:

<?xml version="1.0" encoding="ISO
-
8859
-
1"?>

The rest is user defined tags

Every tag has an opening and a
closing

Element naming

XML elements must follow these naming
rules:

Names can contain letters, numbers, and other characters

Names must not start with the letters xml (or XML or Xml ..)

Names cannot contain spaces

Elements and attributes

Use
elements

to describe data

Use
attributes

to present information
that is not part of the data

For example, the file type or some
other information that would be useful
in processing the data, but is not part
of the data.

Repeating elements

Naming an element means it appears
exactly once.

Name+ means it appears one or
more times

Name* means it appears 0 or more
times.

Name? Means it appears 0 or one
time.

Parts of an XML document

Elements

The components of an XML document

Some contain other parts, some are empty

Ex in HTML: “
br
” or “table” in XML “ingredient”

Attributes

Ex in HTML “
src
=” in XML “scale=”

Entities

Special characters or strings with pre
-
assigned meaning

Ex in HTML &
nbsp

for non
-
breaking space

PCDATA

Parsed Character data: text that
will be parsed
and interpreted by the
reader. Tags and entities will be expanded and used in presentation.

CDATA

Character data: text that
will
not

be parsed
and interpreted. It will be
displayed exactly as provided.

The HTML examples are
familiar; the XML examples

dependent
on the specific XML
scheme used

Using XML
-

an example

Define the fields of a recipe collection:

<?xml version="1.0" encoding="ISO
-
8859
-
1"?>

<recipe>

<recipe
-
title> </recipe
-
title>

<ingredient
-
list>

<ingredient>

<ingredient
-
amount> </ingredient
-
amount>

<ingredient
-
name> </ingredient
-
name>

</ingredient>

</ingredient
-
list>

<directions>

</directions>

</recipe>

ISO 8859 is a character set.

See http://www.bbsinc.com/iso8859.html

Processing the XML data

How do we know what to do with the
information in an XML file?

Document Type Definition (DTD)

Put in the same file as the data
--

immediate
reference

Put a reference to an external description

Provides the definition of the legitimate
content for each element

Document Type Definition

<?xml version="1.0" encoding="ISO
-
8859
-
1"?>

<!DOCTYPE recipe [

<!ELEMENT recipe (recipe
-
title, ingredient
-
list, directions)>

<!ELEMENT recipe
-
title (#PCDATA)>

<!ELEMENT ingredient
-
list (ingredient)>

<!ELEMENT ingredient
(ingredient
-
amount, ingredient
-
name)*>

<!ELEMENT ingredient
-
amount (#PCDATA)>

<!ELEMENT ingredient
-
name (#PCDATA)>

<!ELEMENT directions (#PCDATA)> ]>

Repeat 0 or more times

<?xml version="1.0" encoding="ISO
-
8859
-
1"?>

<!DOCTYPE recipe SYSTEM “
recipe.dtd
”>

<recipe>

<recipe
-
-
title>

<ingredient
-
list>

<ingredient>

<ingredient
-
amount>3 </ingredient
-
amount>

<ingredient
-
name> egg whites</ingredient
-
name>

</ingredient> <ingredient>

<ingredient
-
amount> 1 cup</ingredient
-
amount>

<ingredient
-
name> sugar</ingredient
-
name>

</ingredient> <ingredient>

<ingredient
-
amount>1 teaspoon </ingredient
-
amount>

<ingredient
-
name> vanilla</ingredient
-
name>

</ingredient> <ingredient>

<ingredient
-
amount>2 cups </ingredient
-
amount>

<ingredient
-
name>mini chocolate chips </ingredient
-
name>

</ingredient>

</ingredient
-
list>

<directions>Beat the egg whites until stiff. Stir in sugar, then vanilla. Gently fold in chocolate chips. Place
in warm oven at 200 degrees for an hour. Alternatively, place in an oven at 350 degrees. Turn oven off
and leave overnight.

</directions>

</recipe>

Not the way that I
want to see a recipe in
a magazine!

What could we
do with a large
collection of
such entries?

How would we
get the
information
entered into a
collection?

External reference to DTD

XML exercise

Design an XML schema for an application of

Examples
--

DVD collection, …

Another example

A paper with content encoded with XML:

First few lines:

<?xml version="1.0" encoding="ISO
-
8859
-
1"?>

<?xml
-
stylesheet

href
="ePBLpaper11.css" type="text/
css
"?>

<?xml
-
stylesheet

href
="ePBLpaper11.xsl" type="text/
xsl
"?>

<!DOCTYPE paper SYSTEM "ePBLpaper11.dtd">

<paper id="proj3">

<info>

<title>Standards E
-
learning and their possible support for a rich pedagogic approach in a

'Integrated Learning' context</title>

<authors>

<author>

<
firstname
>
Rodolophe
</
firstname
>

<
familyname
>Borer</
familyname
>

<
homepageurl
>http://
tecfa.unige.ch/perso/staf/borer
/</
homepageurl
>

<email/>

</author>

</authors>

"ePBLpaper11.dtd” shown on next slide

<?xml version="1.0" encoding="ISO
-
8859
-
1" ?>

<!
--

_________ _____________________
--
>

<!
--

ePBL
-
project DTD for student project management
& specification
--
>

<!
--

--
>

<!
--

--
>

<!
--

Daniel K. Schneider

--
>

<!
--

http://tecfa.unige.ch/tecfa
-
people/schneider.html
--
>

<!
--

Created: 13/11/2002 (based on EVA_pm grammar)

--
>

<!
--

Updated: 07/05/2004

--
>

<!
--

VERSIONS

--
>

<!
--

v1.1 Adaptations to use with Morphon xml editor
--
>

<!
--

____________________
--
>

<!
--

_ ENTITY DECLARATIONS ______
--
>

<!
ENTITY % foreign
-
dtd SYSTEM "ibtwsh6_ePBL.dtd">

%foreign
-
dtd;

<!ENTITY % id "id ID #IMPLIED">

<!
--

______ MAIN ELEMENT _________
--
>

<!ELEMENT project (name, authors, date, updated,
goal, state
-
of
-
the
-
art, research
-
development
-
questions, methodology, workpackages ) >

<!ELEMENT name (#PCDATA )>

<!ELEMENT date (#PCDATA )>

<!ELEMENT authors (#PCDATA )>

<!ELEMENT updated (#PCDATA )>

<!ELEMENT goal (title, description )>

<!ELEMENT state
-
of
-
the
-
art %vert.model;>

<!ATTLIST state
-
of
-
the
-
art %id;>

<!ELEMENT research
-
development
-
questions (question
)+>

<!ELEMENT question (title, description )>

<!ELEMENT methodology %vert.model;>

<!ATTLIST methodology %id;>

<!ELEMENT
workpackages (workpackage

)+>

<!ELEMENT workpackage (planning, objectives,
deliverables )>

<!ATTLIST workpackage %id;>

<!ELEMENT objectives (objective )+>

<!ELEMENT objective (title, description )>

<!ELEMENT deliverables (deliverable )+>

<!ELEMENT deliverable (url, title, description )>

<!ELEMENT url (#PCDATA )>

<!ELEMENT planning (from, to, progress )>

<!ELEMENT from (#PCDATA )>

<!ELEMENT to (#PCDATA )>

<!ELEMENT progress (#PCDATA )>

<!
--

________________________
--
>

<!ELEMENT title (#PCDATA )>

<!ATTLIST title %id;>

<!ELEMENT description %vert.model;>

<!
--

_______________________
--
>

Source: http://tecfa.unige.ch/staf/staf
-
j/vuilleum/staf18/p6/

Vocabulary

Given the need for processing, do you want free text
or restricted entries?

Free text gives more flexibility for the person making
the entry

Controlled vocabulary helps with

Consistent processing

Comparison between entries

Controlled vocabulary limits

Options for what is said

Vocabulary example

Recipe example

What text should be controlled?

What should be free text?

Ingredients

Ingredient
-
amount

Ingredient
-
name

Should we revise how we coded ingredient amount?

Directions

Dublin Core

Standard set of metadata fields for entries in
digital libraries:

Title, creator, subject, description, publisher,
contributor, date, type, format, identifier, source,
language, relation, coverage, rights

Dublin Core elements

see:

http://
dublincore.org/documents/dces
/

Title

Creator

Subject
-

C

Description

Publisher

Contributor

Date

Type
-

C

Format
-

C

Identifier

Source

Language

Relation

Coverage
-

C

Rights

Rights Management information

Space, time, jurisdiction.

C = controlled vocabulary recommended.

Ref. to related resource

Standards RFC 3066, ISO639

Unambiguous ID

Ex: collection, dataset,
event, image

YYYY
-
MM
-
DD, ex.

Entity primarily responsible for
making content of the resource

Entity making the resource
available

Contributor to content of
the resource

What is needed to
display or operate the
resource.

Dublin Core Terms

An update to the original DC elements

Adds the concept of range and domain

Each term has this minimal set of attributes:

Name:

A token appended to the URI of a DCMI namespace to
create the URI of the term.

Label:

The human
-
readable label assigned to the term.

URI:

The Uniform Resource Identifier used to uniquely
identify a term.

Definition:

A statement that represents the concept and
essential nature of the term.

Type of Term:

The type of term as described in the DCMI
Abstract Model [DCAM].

DC Terms

:

Comment:

See:

Authoritative documentation related to the term.

References:

A resource referenced in the Definition or Comment.

Refines:

A Property of which the described term is a Sub
-
Property.

A Class of which the described term is a Super
-
Class.

Narrower Than:

A Class of which the described term is a Sub
-
Class.

Has Domain:

A Class of which a resource described by the term is an
Instance.

Has Range:

A Class of which a value described by the term is an Instance.

Member Of:

An enumerated set of resources (Vocabulary Encoding Scheme)
of which the term is a Member.

Instance Of:

A Class of which the described term is an instance.

Version:

A specific historical description of a term.

Equivalent Property:

A Property to which the described term is equivalent.

The DC Terms

from 15 to …

abstract,
accessRights
,
accrualMethod
,
accrualPeriodicity
,
accrualPolicy
, alternative, audience, available,
bibliographicCitation
,
conformsTo
, contributor, coverage,
created, creator, date,
dateAccepted
,
,
dateSubmitted
, description,
educationLevel
, extent, format,
hasFormat
,
hasPart
,
hasVersion
, identifier,
instructionalMethod
,
isFormatOf
,
isPartOf
,
isReferencedBy
,
isReplacedBy
,
isRequiredBy
, issued,
isVersionOf
, language,
license, mediator, medium, modified, provenance, publisher,
references, relation, replaces, requires, rights,
rightsHolder
,
source, spatial, subject,
tableOfContents
, temporal, title,
type, valid

DC terms

See
http://dublincore.org/documents/dcmi
-
terms/

Review the list and see what has been added

A
Drupal

example

Ensemble:
www.computingportal.org

IEEE
-

LOM

Example of a specialized metadata scheme

Specifically for collections of educational materials

Includes all of Dublin Core

See
http://projects.ischool.washington.edu/sasutton/IEEE1484.html

Computing systems

Linux machines

Introduction to unix:
http://www.csc.villanova.edu/~lab/unix/

Dspace:
http://www.dspace.org/

Documentation, including installation
-

machines. He will send a message to the class by the
middle of the week with details of machine location and

Remember
-

you have the option to use your own machine, but
must meet the criteria described last week.

This session

Defined meta data and its role in digital
libraries.

Introduced XML as a language for describing
a collection of content.

Described the computing resources and how
to get ready for the first DL setup.