Introduction to XML

clappingknaveΛογισμικό & κατασκευή λογ/κού

14 Δεκ 2013 (πριν από 3 χρόνια και 5 μήνες)

58 εμφανίσεις

XML Lecture 1


XML

Motivation & Syntax



Monica Farrow

email : M.Farrow@hw.ac.uk


XML
-

Motivation & Syntax

2

XML Topics


This lecture


Motivation


Storing XML


Programming and XML


Syntax



Describing the document



DTD, XML Schema



Accessing the elements using
XPath


Transforming XML using XSLT


XML
-

Motivation & Syntax

3

XML in One Slide


Basically, XML is an annotated text file. The data
(an element) is surrounded by descriptive start
and end tags. Elements can have attributes listed
in the start tag.


Example:

<
person
>


<
name id = “
42

>

Lisa Simpson
</
name
>


<
tel
>

0
131
-
828
-
1234
</
tel
>


<
tel
>

0
78
-
4701
-
7775
</
tel
>


<
email
>

lisa@
macs.hw.ac.uk

</
email
>



</
person
>


XML
-

Motivation & Syntax

4

Motivation


XML allows us to create machine
-
readable text
files.


In the file with Lisa’s data, without XML tags, how
can we easily specify a semi
-
structured format?
E.g.


Compulsory name


Between 0 and 4 telephone numbers


Optional email


Using XML, the data is labelled with tags, so can be
easily identified.


The next few slides show some uses of XML:


XML
-

Motivation & Syntax

5

Application data


Applications can use XML to store, transmit, and
display data.


E.g. To keep track of the updates which have been
downloaded


Version number, file names, installation time etc


E.g. To specify start
-
up settings or parameters


These can be very extensive, can be generated by
‘wizards’ and modified by humans


E.g. To send data between the server and the
client during web applications (jquery and
javascript)


More about this later


XML
-

Motivation & Syntax

6

Web services


“A
Web service

is a software function provided
at a network address over the web or the cloud, it
is a service that is "always on” ”(wikipedia)


It’s not used through a GUI by a person


A software developer could use a web service
within an application.


They use XML to tag the data.


Protocols based on XML are used to:


Transfer the data (SOAP)


Describe the service (WSDL)


List available services (UDDI)

XML
-

Motivation & Syntax

7

Web services
-

SOAP


SOAP Simple Object Access Protocol


For exchanging data between any web applications

<?xml version="
1.0
" encoding="UTF
-
8
"?>

<
soap:Envelope

xmlns:soap
=


"http://schemas.xmlsoap.org/soap/envelope/">


<
soap:Header
> SOAP Example </
soap:Header
>


<
soap:Body
>


<
desks:NumberInStock
>
200






</
desks:NumberInStock
>


</
soap:Body
>

</
soap:Envelope
>


XML
-

Motivation & Syntax

8

Write Once Use Everywhere

XHTML

(browser for
mobile)

XML document

X
S
L

XHTML

(web browser
on PC)

TEXT

(Excel)


Separation of content from presentation


“Write once read anywhere”


The same document can be transformed using XSL
(eXtensible stylesheet language) into different
formats


XML
-

Motivation & Syntax

9

Some existing XML
-
based languages


XHTML


XML compatible version of HTML


DocBook


For any documentation. Tags such as title, chapter,
para etc


ODF (OpenDocument Format)


For office documents such as word processing or
spreadsheets . Used by OpenOffice.


MathXML


To describe mathematical formulae


14
/
12
/
2013

XML In and Out

10

XML data file Storage


3 options


As a text file


simple


used in this course



In a ‘native’ XML database (NXD)


Designed especially for XML, holds a collection of XML
documents


Many different ones on the market


non standard


Extract data with XPath, XSLT
(introduced in 3
rd

XML lecture)
or
the XML query language FLOWR
(not covered in course)



Using a relational DBMS (now SQL has XML functions too)


EITHER store the XML document as the value of some
field within a row


OR store the XML in a shredded form across a number
of fields and tables


14/12/2013

XML In and Out

11

XML and Programming


To read an XML document in a programming language,
the processing steps are:


Reading the raw data as a stream of characters


Parsing the raw data


Recognising tags, content, attribute pairs


Passing the result to a client class or function for
application specific processing



Many programming languages have a library of functions
using Document Object Model [DOM], a tree
-
based
interface


The programmer can navigate up and down the tree.


Details not covered in the course


XML

Syntax

XML
-

Motivation & Syntax

13

XML Overview


XML is a ‘human
-
legible’ simplified subset of the
Standardised General Markup Language, on which
HTML is also based



Data is divided into elements and attributes. Each
element is surrounded by a start tag and an end
tag. The end tag resembles the start tag but
includes a backslash before the tagname.


<tel>
0131

444 7777
</tel>



Tagnames are chosen to reflect the
meaning

of
the element content


(In html, tagnames are chosen

to indicate page structure)

HTML

XML

SGML

XML
-

Motivation & Syntax

14

Elements


The segment of an XML document between an
opening and a corresponding closing tag is called
an element


Elements may contain text or other elements


<
person>


<name>

Bart Simpson

</name>


<tel>

0131

444 7777

</tel>



<tel>

0
78

4
011 6022

</tel>


<email>

bart@
ed
.ac.
uk

</email>


</person>

Element contains other
elements

element,

Contains text

Can be >
1
ele
ment with the
same tagname

XML
-

Motivation & Syntax

15

XML Document is a Tree


XML documents are abstractly modeled as trees, as
reflected by their nesting


Sometimes, XML documents are graphs


(by using IDs and IDREFs
to link elements
)

person

name

email

tel

tel

Bart Simpson

0131
-
444 7777

0
78

4011

6022

bart@
ed.ac.uk

XML
-

Motivation & Syntax

16

Elements Can
Be Nested

<
addresses>


<person>



<name>

Donald Duck
</name>



<tel>

0
131
-
828

1345
</tel>



<tel>

0
131
-
828

1374
</tel>



<email>

donald@macs.hw.ac.uk
</email>


</person>


<person>



<name>

Mi
ckey

Mouse
</name>



<tel>

0
141
-
426

1142
</tel>


</person>

</addresses>

XML
-

Motivation & Syntax

17

Semi
-
structured data


XML is ideal for semi
-
structured data


If an extra telephone number, add it in


If no email at all, leave it out



No need for empty fields or multiple tables.


In a corresponding database for up to 4 telephone
numbers, the database design would include
spaces for 4 numbers, or a separate phone number
table.

XML
-

Motivation & Syntax

18

Attributes


An opening tag may contain attributes



These are typically used to describe the contents
of an element


<entry>


<word

language = “en”
>

cheese
</word>


<word

language = “fr”
>

fromage
</word>


<word

language = “ro”
>

branza
</word>


<meaning>

A food made …
</meaning>

</entry>

XML
-

Motivation & Syntax

19

When to Use Attributes



It’s not always clear when to use

attributes,


How should ssno (social security number, american) be
stored?


<
person
ssno= “123 4589”
>
<
person>


<name>
L. Simpson

</name> <ssno>
123 4567
</ssno>


<email> <name>
L.

Simpson
</name>


lisa@macs.hw.ac.uk
<email>


</email>
lisa@macs.hw.ac.uk


... </e
mail>

</person>
...


</person>


XML
-

Motivation & Syntax

20

When to Use Attributes


Using an attribute rather than elements might
make the structure more difficult to alter in the
future. In attributes:


Multiple values are not permitted


Tree structures are not permitted



General rule


avoid using attributes unless there
is a good reason for using them


Use an attribute to describe how the data should
be interpreted (e.g. language, currency)


Use an attribute for “IDs”, i.e., identifying data
(covered later)


XML
-

Motivation & Syntax

21

A Complete XML Document

<?
xml version ="1.0" encoding="UTF
-
8" ?>

<addresses>


<person ss
no = “113”
>



<name>
Lisa Simpson
</name>



<tel>

0131
-
828 1234
</tel>



<tel>

078
-
4701 7775
</tel>



<
email>

lisa@macs.hw.ac.uk
</email>


</person>

</addresses>

Required

XML
-

Motivation & Syntax

22

Empty element, and case


There is a special shortcut for tags that have
only attributes, with no text or sub
-
elements
in
between them (empty element, bachelor tag)


<
img

src
=“myPic.jpg”

/>
instead of


<
img

src
=“myPic.jpg”

> </
img
>



XML is
case
-
sensitive
, i.e., the following are different:
<person>, <Person>, <PERSON>


XML
-

Motivation & Syntax

23

Well Formed Documents


A document is
well
-
formed
if it has


One top
-
level
element (root element)


Tags come in properly nested case
-
sensitive pairs


Empty elements may use the accepted shortcut /


Attribute values must be enclosed in quotes


Attribute names must not be repeated within a tag




XML
-

Motivation & Syntax

24

Are these valid xml files?

<?xml version=“1.0”?>

<Question>
Here is a question
</Question>





<?xml version=“1.0”?>

<Question>
Here is a question
</Question>

<Answer>
Here is an answer
</Answer>


XML
-

Motivation & Syntax

25

Why is this not

well
-
formed?

<?
xml version ="1.0" encoding="UTF
-
8" ?>

<person phone= 0131
-
828 1234


phone
=
078
-
4701 7775 >


<
N
ame>


<first>
Homer


<second>
Simpson


</first></second>


</name>

<person phone= 0131
-
828 1235 >


<
N
ame>


<first>
Lisa


<second>
Simpson


</first></second>


</name>

XML
-

Motivation & Syntax

26

XML Authoring


There are many authoring tools available to
facilitate the creation of XML documents.


VisualStudio for Windows is in the lab


However, you may as well start off using a
simple

text editor (not Word) which allows
access to line numbers, ideally XML aware


XML is after all just a text file.


E.g. Notepad++ for Windows


Most linux text editors are ok


You are then responsible for checking that the
XML is correct!

XML
-

Motivation & Syntax

27

Viewing and checking XML


If well formed XML is loaded into your
browser it will be displayed as a tree
structure


This is perhaps simplest way to check that
XML is well formed



XML
-

Motivation & Syntax

28

Viewing and checking XML


If incorrect XML is loaded into your browser then
error messages will be displayed



XML
-

Motivation & Syntax

29

Exercise 1


An XML file holds information about holiday homes for
rent. Write an example of such an XML file which
containing 2 or 3 records. Invent appropriate element and
attribute names.


Each home has an id, a name,a location and optional url


Additionally, each home has one or more sets of contact
details. Contact details consist of a name and a phone
number, and optionally an email address.


People do not own more than one holiday home.



In your example, demonstrate optional or repeated
elements.


How would you hold this information in a relational
database?

XML
-

Motivation & Syntax

30

Referencing other elements


Unique elements (identified here by an attribute)
can be referred to from other elements


In this way, relationships between elements can
be shown without repetition


E.g.


Books and authors can be listed. But each book
may have >1 author, each author might write >1
book. So the book can contain a reference to the
author.
See books.xml

Extract from books.xml

<
book

bookID

=
"
222KK
"

year
=
"
2000
"
>
** an id


<
title
>
Data on the Web
</
title
>


<
Author
>
4
</
Author
>
**** element references an id


<
Author
>
2
</
Author
>


<
publisher
>
Morgan Kaufmann Publishers
</
publisher
>


<
price
>
39.95
</
price
>

</
book
>

.....

<
author

authID

=
"
4
"
>

**** an id

<
firstName
>
Mary
</
firstName
>


<
lastName
>
Thomson
</
lastName
>


<
Book
>
222KK
</
Book
>
** element references an id

</
author
>

Asterisks show links between the data (in the same file)

XML
-

Motivation & Syntax

31

Exercise


2 (using ids)


An XML file holds information about holiday homes for
rent. Write an example of such an XML file which
containing 2 or 3 records. Invent appropriate element and
attribute names. Use books.xml as an example.


Each home has an id, a name, a location and optional url


Each contact has a name, phone and optional email address


Each person can own many homes


Each home can be owned by more than one person



How would you hold this information in a relational
database?


XML
-

Motivation & Syntax

32

XML
-

Motivation & Syntax

33

Defining the structure of an XML file


We can check if an XML file is well
-
formed


by looking at it, maybe


By loading it into a browser


If well
-
formed, it will be displayed



However, how can we check that the well
-
formed file
contains the correct elements in the correct quantities?
E.g.


Musn’t contain tagnames that aren’t expected


Must contain tagnames that are expected


Must contain the correct number of tags with the same
tagname



We need to write a specification for the XML file


See the next lecture