Content

toiletquietInternet και Εφαρμογές Web

5 Δεκ 2013 (πριν από 3 χρόνια και 8 μήνες)

80 εμφανίσεις

Digital Libraries

Models and Content

Goals for tonight


Finish up from last week


the 5 S model more formally


Status of the systems available


Obtaining, describing, indexing content


XML


Dublin Core


Introducing content exchanges (OAI)


Applying
the 5S
model, informally

Choose a subject area


then answer the questions


Stream
-

what types of data? gif, jpg,
avi
,
docx
,
pdf
, html?


Structure
-

How are the elements organized? Is there a
hierarchy? Are there multiple structures?


Spaces
-

How will we index the items? How will we divide them
into related groups


Scenarios
-

what services will we provide? What information do
we need to provide those services? What events might happen
that we need to plan for?


Societies
-

who is the library intended to serve? Remember to
include agents and other processes as well as users.

This is the first deliverable for your first project
.

More formally: Definitions


Definition: A
stream

is a sequence whose
codomain

is a non empty set.


Definition: A

structure

is a
tuple

(
G, L, F
)
where
G

= (
V,E
) is a directed graph with vertex
set
V

and edge set
E, L

is a set of label values,
and
F

is a labeling function.
F : (V


E ) → L
.

See
http://www.mathsisfun.com/sets/domain
-
range
-
codomain.html

for
a nice description of domain, range, codomain if you need it.

Structure illustration

Images

Audio files

Books

Collection

includes

A very simple structure. How might it be enhanced? How would an
index be included? What substructures might be added?

What are the G, L
, F
, V, E
parts of this example?

Definitions, cont’d


Definition: A
space

is a measurable space, measure
space, probability space, vector space, topological
space, or metric space


A vector space is a representation for the set of elements
in a collection. The vector representing each element is a
set of characteristics held by that element and both
connecting that element to others that are similar and
distinguishing it from those that are different.


We will do an exercise to illustrate

Vector space illustration


Consider a car. What are the characteristics that
you associate with a car?



If you want to compare one car to another, what
characteristics would you choose?


If you wanted to distinguish a car from another type
of vehicle, what characteristics would you need?


distinguish from a snowmobile


distinguish from a truck


Make a vector of those characteristics.


Then, fill in the vector for several specific cars.

Definitions
-

3


Definition: A
scenario
is a sequence of related
transition events (e
1
, e
2
, …, e
n
) on state set S such
that
e
k

= (
s
k
,
s
k+1
,) for 1 <=
k

<=
n
.


More easily visualized, a scenario is a path in a directed
graph, G = (S, ∑
e
), where vertices correspond to states
in the state set S and directed edges are equivalent to
events in a set of events, ∑
e
, and correspond to
transitions between states.


Scenarios must be implemented to make a working
system.

Definitions
-

4


Definition: A
society
is a
tuple

(C,R) where


C = (c
1
, c
2
, …,
c
n
) is a set of conceptual communities, each
community referring to a set of individuals of the same
class or type (e.g. actors, activities, components,
hardware, software, data)
;



R = (r
1
, r
2
, …,
r
m
) is a set of relationships, each relationship
being a
tuple

r
j

= (
e
j
,
i
j
) where
e
j

is a Cartesian product c
k
1

x

c
k
2

x


x

c
k
n
j
. 1<= k
1

< k
2
< … <
k
n
j
<=
n
, which specifies the
communities involved in the relationship and
i
j

is an activity.


Projects in our DL laboratory


Mendel 289 is the center of activity for projects
related to digital libraries and similar projects.


Summary of the projects under way, which may
present opportunities for class projects or for
independent study


NSDL, CITIDEL, CSTA, Ensemble, Distributed
Expertise, Computing Ontology, Interdisciplinary
Computing and its relationship to the libraries ….

Our systems


Now available


Fedora
linux

machines, remotely accessible (use the gateway)


Bare machines with just basic system


We can install
Drupal

either from the
Drupal

site (doing things
for ourselves) or from the
Bitnami

site (builds the stack for us)


I just heard that
Drupal

may already be installed. Feel free to
uninstall and reinstall if you wish.


If you have a computer of your own and want to use it,


Fine, but you must be able to demonstrate it to the class at the
end of the semester. I will need to be able to see what you are
doing from time to time during the semester.


That means you
need a static IP address.

The Digital Library Content


Essential elements for a digital library


Users


Content


Services

Content
-

requirements


Obtain


Store


Organize


Describe


Find


Deliver

Describing the content


How to describe content


Metadata


Machine readable description of anything


What description


Machine readable requires standard descriptive
elements


Dublin Core (
http://dublincore.org/
)


International standard


“a standard for cross
-
domain information resource
description.”


15 descriptive elements


Other metadata schemes


IEEE
-
LOM

Metadata


What does metadata look like?


Metadata is data about data


Information about a resource, encoded in
the resource or associated with the
resource.


The language of metadata: XML


eXtensible

Markup Language

XML


XML is a markup language


XML describes features


There is no standard XML


Use XML to create a resource type


Separately develop software to interact
with the data described by the XML
codes.

Source: tutorial at w3school.com

XML rules


Easy rules, but very strict


First line is the version and character
set used:


<?xml version="1.0" encoding="ISO
-
8859
-
1"?>



The rest is user defined tags


Every tag has an opening and a
closing



Element naming



XML elements must follow these naming
rules:


Names can contain letters, numbers, and other characters


Names must not start with a number or punctuation character


Names must not start with the letters xml (or XML or Xml ..)


Names cannot contain spaces


Elements and attributes


Use
elements

to describe data


Use
attributes

to present information
that is not part of the data


For example, the file type or some
other information that would be useful
in processing the data, but is not part
of the data.


Repeating elements


Naming an element means it appears
exactly once.


Name+ means it appears one or
more times


Name* means it appears 0 or more
times.


Name? Means it appears 0 or one
time.


Parts of an XML document


Elements


The components of an XML document


Some contain other parts, some are empty


Ex in HTML: “
br
” or “table” in XML “ingredient”


Attributes


Information about elements, not data


Ex in HTML “
src
=” in XML “scale=”


Entities


Special characters or strings with pre
-
assigned meaning


Ex in HTML &
nbsp

for non
-
breaking space


PCDATA


Parsed Character data: text that
will be parsed
and interpreted by the
reader. Tags and entities will be expanded and used in presentation.


CDATA


Character data: text that
will
not

be parsed
and interpreted. It will be
displayed exactly as provided.


The HTML examples are
familiar; the XML examples
are made up


dependent
on the specific XML
scheme used

Using XML
-

an example

Define the fields of a recipe collection:

<?xml version="1.0" encoding="ISO
-
8859
-
1"?>

<recipe>

<recipe
-
title> </recipe
-
title>

<ingredient
-
list>


<ingredient>


<ingredient
-
amount> </ingredient
-
amount>


<ingredient
-
name> </ingredient
-
name>


</ingredient>

</ingredient
-
list>

<directions>

</directions>

</recipe>

ISO 8859 is a character set.

See http://www.bbsinc.com/iso8859.html


Processing the XML data


How do we know what to do with the
information in an XML file?


Document Type Definition (DTD)


Put in the same file as the data
--

immediate
reference


Put a reference to an external description


Provides the definition of the legitimate
content for each element

Document Type Definition


<?xml version="1.0" encoding="ISO
-
8859
-
1"?>


<!DOCTYPE recipe [


<!ELEMENT recipe (recipe
-
title, ingredient
-
list, directions)>


<!ELEMENT recipe
-
title (#PCDATA)>


<!ELEMENT ingredient
-
list (ingredient)>


<!ELEMENT ingredient
(ingredient
-
amount, ingredient
-
name)*>


<!ELEMENT ingredient
-
amount (#PCDATA)>


<!ELEMENT ingredient
-
name (#PCDATA)>


<!ELEMENT directions (#PCDATA)> ]>

Repeat 0 or more times

<?xml version="1.0" encoding="ISO
-
8859
-
1"?>

<!DOCTYPE recipe SYSTEM “
recipe.dtd
”>

<recipe>

<recipe
-
title> Meringue cookies</recipe
-
title>

<ingredient
-
list>


<ingredient>


<ingredient
-
amount>3 </ingredient
-
amount>


<ingredient
-
name> egg whites</ingredient
-
name>


</ingredient> <ingredient>


<ingredient
-
amount> 1 cup</ingredient
-
amount>


<ingredient
-
name> sugar</ingredient
-
name>


</ingredient> <ingredient>


<ingredient
-
amount>1 teaspoon </ingredient
-
amount>


<ingredient
-
name> vanilla</ingredient
-
name>


</ingredient> <ingredient>


<ingredient
-
amount>2 cups </ingredient
-
amount>


<ingredient
-
name>mini chocolate chips </ingredient
-
name>


</ingredient>

</ingredient
-
list>

<directions>Beat the egg whites until stiff. Stir in sugar, then vanilla. Gently fold in chocolate chips. Place
in warm oven at 200 degrees for an hour. Alternatively, place in an oven at 350 degrees. Turn oven off
and leave overnight.


</directions>


</recipe>

Not the way that I
want to see a recipe in
a magazine!

What could we
do with a large
collection of
such entries?

How would we
get the
information
entered into a
collection?

External reference to DTD

XML exercise


Design an XML schema for an application of
your choice. Keep it simple.


Examples
--

address book, TV program listing,
DVD collection, …

Another example


A paper with content encoded with XML:
http://tecfaseed.unige.ch/staf18/modules/ePBL/uploads/proj3/paper81.xml


First few lines:


<?xml version="1.0" encoding="ISO
-
8859
-
1"?>


<?xml
-
stylesheet

href
="ePBLpaper11.css" type="text/
css
"?>


<?xml
-
stylesheet

href
="ePBLpaper11.xsl" type="text/
xsl
"?>


<!DOCTYPE paper SYSTEM "ePBLpaper11.dtd">


<paper id="proj3">



<info>



<title>Standards E
-
learning and their possible support for a rich pedagogic approach in a



'Integrated Learning' context</title>



<authors>



<author>



<
firstname
>
Rodolophe
</
firstname
>



<
familyname
>Borer</
familyname
>



<
homepageurl
>http://
tecfa.unige.ch/perso/staf/borer
/</
homepageurl
>



<email/>



</author>



</authors>

"ePBLpaper11.dtd” shown on next slide



<?xml version="1.0" encoding="ISO
-
8859
-
1" ?>

<!
--

_________ _____________________
--
>

<!
--

ePBL
-
project DTD for student project management
& specification
--
>

<!
--

Copyright: (2004)
Paraskevi.Synteta@tecfa.unige.ch
--
>

<!
--

http://tecfa.unige.ch/~paraskev/
--
>

<!
--

Daniel K. Schneider

--
>

<!
--

http://tecfa.unige.ch/tecfa
-
people/schneider.html
--
>

<!
--

Created: 13/11/2002 (based on EVA_pm grammar)

--
>

<!
--

Updated: 07/05/2004


--
>

<!
--

VERSIONS


--
>

<!
--

v1.1 Adaptations to use with Morphon xml editor
and addition of IDs
--
>

<!
--

____________________
--
>

<!
--

_ ENTITY DECLARATIONS ______
--
>

<!
ENTITY % foreign
-
dtd SYSTEM "ibtwsh6_ePBL.dtd">

%foreign
-
dtd;

<!ENTITY % id "id ID #IMPLIED">

<!
--

______ MAIN ELEMENT _________
--
>

<!ELEMENT project (name, authors, date, updated,
goal, state
-
of
-
the
-
art, research
-
development
-
questions, methodology, workpackages ) >

<!ELEMENT name (#PCDATA )>

<!ELEMENT date (#PCDATA )>

<!ELEMENT authors (#PCDATA )>


<!ELEMENT updated (#PCDATA )>

<!ELEMENT goal (title, description )>

<!ELEMENT state
-
of
-
the
-
art %vert.model;>

<!ATTLIST state
-
of
-
the
-
art %id;>

<!ELEMENT research
-
development
-
questions (question
)+>


<!ELEMENT question (title, description )>

<!ELEMENT methodology %vert.model;>

<!ATTLIST methodology %id;>

<!ELEMENT
workpackages (workpackage

)+>

<!ELEMENT workpackage (planning, objectives,
deliverables )>

<!ATTLIST workpackage %id;>

<!ELEMENT objectives (objective )+>

<!ELEMENT objective (title, description )>

<!ELEMENT deliverables (deliverable )+>

<!ELEMENT deliverable (url, title, description )>

<!ELEMENT url (#PCDATA )>

<!ELEMENT planning (from, to, progress )>

<!ELEMENT from (#PCDATA )>

<!ELEMENT to (#PCDATA )>

<!ELEMENT progress (#PCDATA )>

<!
--

________________________
--
>

<!ELEMENT title (#PCDATA )>

<!ATTLIST title %id;>

<!ELEMENT description %vert.model;>

<!
--

_______________________
--
>

Source: http://tecfa.unige.ch/staf/staf
-
j/vuilleum/staf18/p6/

Vocabulary


Given the need for processing, do you want free text
or restricted entries?


Free text gives more flexibility for the person making
the entry


Controlled vocabulary helps with


Consistent processing


Comparison between entries


Controlled vocabulary limits


Options for what is said

Vocabulary example


Recipe example


What text should be controlled?


What should be free text?


Ingredients


Ingredient
-
amount


Ingredient
-
name


Should we revise how we coded ingredient amount?


Directions

Dublin Core


Standard set of metadata fields for entries in
digital libraries:


Title, creator, subject, description, publisher,
contributor, date, type, format, identifier, source,
language, relation, coverage, rights


Dublin Core elements

see:

http://
dublincore.org/documents/dces
/


Title


Creator


Subject
-

C


Description


Publisher


Contributor


Date


Type
-

C


Format
-

C


Identifier


Source


Language


Relation


Coverage
-

C



Rights

Rights Management information

Space, time, jurisdiction.

C = controlled vocabulary recommended.

Ref. to related resource


Standards RFC 3066, ISO639


Unambiguous ID

Ex: collection, dataset,
event, image

YYYY
-
MM
-
DD, ex.

Entity primarily responsible for
making content of the resource

Entity making the resource
available

Contributor to content of
the resource

What is needed to
display or operate the
resource.

Dublin Core Terms


An update to the original DC elements


Adds the concept of range and domain


Each term has this minimal set of attributes:


Name:


A token appended to the URI of a DCMI namespace to
create the URI of the term.


Label:


The human
-
readable label assigned to the term.


URI:


The Uniform Resource Identifier used to uniquely
identify a term.


Definition:


A statement that represents the concept and
essential nature of the term.


Type of Term:


The type of term as described in the DCMI
Abstract Model [DCAM].

DC Terms

Additional Attributes possible
:



Comment:


Additional information about the term or its application.


See:




Authoritative documentation related to the term.


References:


A resource referenced in the Definition or Comment.


Refines:



A Property of which the described term is a Sub
-
Property.


Broader Than:


A Class of which the described term is a Super
-
Class.


Narrower Than:


A Class of which the described term is a Sub
-
Class.


Has Domain:


A Class of which a resource described by the term is an
Instance.


Has Range:


A Class of which a value described by the term is an Instance.


Member Of:


An enumerated set of resources (Vocabulary Encoding Scheme)
of which the term is a Member.


Instance Of:


A Class of which the described term is an instance.


Version:



A specific historical description of a term.


Equivalent Property:


A Property to which the described term is equivalent.


The DC Terms


from 15 to …

abstract,
accessRights
,
accrualMethod
,
accrualPeriodicity
,
accrualPolicy
, alternative, audience, available,
bibliographicCitation
,
conformsTo
, contributor, coverage,
created, creator, date,
dateAccepted
,
dateCopyrighted
,
dateSubmitted
, description,
educationLevel
, extent, format,
hasFormat
,
hasPart
,
hasVersion
, identifier,
instructionalMethod
,
isFormatOf
,
isPartOf
,
isReferencedBy
,
isReplacedBy
,
isRequiredBy
, issued,
isVersionOf
, language,
license, mediator, medium, modified, provenance, publisher,
references, relation, replaces, requires, rights,
rightsHolder
,
source, spatial, subject,
tableOfContents
, temporal, title,
type, valid

DC terms


See
http://dublincore.org/documents/dcmi
-
terms/


Review the list and see what has been added

A
Drupal

example


Ensemble:
www.computingportal.org



IEEE
-

LOM


Example of a specialized metadata scheme


Learning Object Metadata


Specifically for collections of educational materials


Includes all of Dublin Core


See
http://projects.ischool.washington.edu/sasutton/IEEE1484.html

Computing systems


Linux machines


Introduction to unix:
http://www.csc.villanova.edu/~lab/unix/


Dspace:
http://www.dspace.org/


Documentation, including installation
-


http://www.dspace.org/index.php?option=com_content&task=view&id=151&Itemid=116


Najib Nadi, our system administrator, is setting up the
machines. He will send a message to the class by the
middle of the week with details of machine location and
login.

Remember
-

you have the option to use your own machine, but
must meet the criteria described last week.

This session


Defined meta data and its role in digital
libraries.


Introduced XML as a language for describing
a collection of content.


Described the computing resources and how
to get ready for the first DL setup.