NETS 212: Scalable and Cloud Computing

longtermagonizingInternet and Web Development

Dec 13, 2013 (3 years and 8 months ago)

194 views

© 2013 A. Haeberlen, Z. Ives

NETS 212: Scalable and Cloud Computing

1

University of Pennsylvania

Web services and XML


October 30, 2013

© 2013 A. Haeberlen, Z. Ives

Announcements


HW3 is due tomorrow


You should be running the EMR job by now!



HW4 will be available by the end of this week


Completely new assignment, based on Node.js


Will cover some Node.js basics in tomorrow's lecture



Draft of final project handout is available


Please form teams by the end of this week!


One (!) member of your team should send me an email with
the names of both team members



2

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

A few words on the project


The team project is worth 30% of your grade


This can make a huge difference


There are lots of opportunities for extra credit


You can also propose your own additional features


Some advice:


Get the basic features working first


3

University of Pennsylvania

!!! Start early !!!

© 2013 A. Haeberlen, Z. Ives

So far: Server side


What is a web application like when it is built
exclusively with CGI, servlets, or Node apps?


Slow!
-

Every time the user clicks on something, browser
must load a new page.


But then how can modern web applications (Gmail, Google
Maps, Facebook, ...) be so fast?


Full explanation will take most of the next two lectures


4

University of Pennsylvania

Internet

User

© 2013 A. Haeberlen, Z. Ives

The master plan

5

University of Pennsylvania

Ajax

XMLHttpObject

XML

Web services

DTDs &

XML Schema

XPath

XSLT

Google Web Toolkit

SOAP

REST

HTML

& CSS

DOM

JavaScript

Today

Mix of many

technologies

© 2013 A. Haeberlen, Z. Ives

Plan for today


Web services


Definition


Data interchange problem


Extensible Markup Language (XML)


DTDs and XML Schema; DOM

6

University of Pennsylvania

NEXT

© 2013 A. Haeberlen, Z. Ives

What is a web service?


Intuition: An application that is accessible to
other applications over the web


Examples: Google Search, Google Maps API, Facebook
Graph API, eBay APIs, Amazon Web Services, ...


7

University of Pennsylvania

Alice

Alice

Bob

Bob

Charlie

Map service

(used by Alice)

Web page combines
data from different
sources ('mashup')

© 2013 A. Haeberlen, Z. Ives

A more detailed definition


Key elements:


Machine
-
to
-
machine interaction


Interoperable (with other applications and services)


Machine
-
processable format


Key technologies:


SOAP (and also REST, both of which we've already seen)


WSDL (Web Services Description language; XML
-
based)


8

University of Pennsylvania

"A Web service is a software system designed to support interoperable
machine
-
to
-
machine interaction over a network. It has an interface
described in a machine
-
processable format (specifically WSDL). Other
systems interact with the Web service in a manner prescribed by its
description using SOAP messages, typically conveyed using HTTP with
an XML serialization in conjunction with other Web
-
related standards."

http://www.w3.org/TR/ws
-
arch/

© 2013 A. Haeberlen, Z. Ives

A key challenge


Nodes need to communicate with each other


E.g., using
remote procedure calls


Network messages are strings of bytes


No particular structure
-

must be defined by the application


Sender
marshals

the data and produces a string of bytes


Pointers must be encoded somehow


Specific byte order; metadata to describe the data


Receiver unmarshals the data again

9

University of Pennsylvania

17

01 17 02 48 3F 12 9E ...

17

Marshalling

Unmarshalling

© 2013 A. Haeberlen, Z. Ives

Data interchange is hard


What does Bob need to know to understand
Alice's document?


Physical data model (data encoding)


Code: ASCII or Unicode or ...?


Byte order: Little
-
endian? Big
-
endian?


Marshalling format: Tagged? Fixed? Which field sizes?


Logical data model (data representation)


Semantic heterogeneity


Imprecise and ambiguous values or descriptions


...

10

University of Pennsylvania

Alice

Bob

01 17 02 48 3F 12 ...

© 2013 A. Haeberlen, Z. Ives

Data comes in many formats


What happens if we interconnect different machines?

11

University of Pennsylvania

Data type

Formats

Text

Database

Image

Music

Video

Scientific data

ASCII, Word document, RTF, TeX,
PDF, HTML, ...

MySQL, Oracle, Access, Works,
OpenOffice, ...

JPG, GIF, BMP, PNG, RAW, TIFF,
Corel, Photoshop, ...

AIFF, MP3, AAC, RA, Ogg, MID,
MOD, SWA, ...

AVI, M4V, MPEG, Ogg, WMV, RM,
DVD, MOV, ...

Probably at least as many as there
are researchers

© 2013 A. Haeberlen, Z. Ives

12

Example: ID3v1 tags in MP3

Offs

Len

Description

0

3

Identifier:

"TAG"


3

30

Song title string

33

30

Artist string

63

30

Album string

93

4

Year string

97

28

Comment string

125

1

Zero byte separator

126

1

Track byte

127

1

Genre byte

...

006d3720 da 00 54 41 47 4d 65 6d 62 65 72 73 20 4f 6e 6c

006d3730 79 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

006d3740 00 00 00 53 68 65 72 79 6c 20 43 72 6f 77 00 00

006d3750 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

006d3760 00 54 68 65 20 47 6c 6f 62 65 20 53 65 73 73 69

006d3770 6f 6e 73 00 00 00 00 00 00 00 00 00 00 00 00 31

006d3780 39 39 38 00 00 00 00 00 00 00 00 00 00 00 00 00

006d3790 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

006d37a0 0a ff

"TAG"

"Members Only"

"Sheryl Crow"

"The Globe Sessions"

"1998"

Track #10

Genre not

specified

© 2013 A. Haeberlen, Z. Ives

13

Example: JPEG header

JPEG “JFIF” header:


Start of Image (SOI) marker
--

two bytes (FFD8)


JFIF marker (FFE0)


length
--

two bytes


identifier
--

five bytes: 4A, 46, 49, 46, 00

(the ASCII code equivalent of a zero terminated "JFIF" string)


version
--

two bytes: often 01, 02


the most significant byte is used for major revisions


the least significant byte for minor revisions


units
--

one byte: Units for the X and Y densities


0 => no units, X and Y specify the pixel aspect ratio


1 => X and Y are dots per inch


2 => X and Y are dots per cm


X
density

--

two bytes


Y
density

--

two bytes


X
thumbnail

--

one byte: 0 = no thumbnail


Y
thumbnail

--

one byte: 0 = no thumbnail


(RGB)n
--

3n bytes: packed (24
-
bit) RGB values for the
thumbnail pixels,

n = X
thumbnail

* Y
thumbnail

© 2013 A. Haeberlen, Z. Ives

Problem: Too many formats


You need to look into a manual to find a
specific file format


http://en.wikipedia.org/wiki/List_of_file_formats


http://www.wotsit.org/



Automating data exchange is very hard


O(N
2
) problem: Everyone needs to understand everyone
else's data format



The web is about making data exchange
easier... maybe we can do better?


Goal: "The mother of all file formats"

14

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

Problem: Too many formats

15

University of Pennsylvania

http://xkcd.com/927/

© 2013 A. Haeberlen, Z. Ives

16

Desiderata for data interchange


Ability to represent many kinds of information


Different data structures



Hardware
-
independent encoding


Endian
-
ness, UTF vs. ASCII vs. EBCDIC



Standard tools and interfaces



Ability to define “shape” of expected data


With forwards
-

and backwards
-
compatibility!



That’s XML…

© 2013 A. Haeberlen, Z. Ives

Recap: Web services


Idea: Application accessible to other apps


Can combine data from multiple sources in a 'mashup'


Key technologies: SOAP/REST, WSDL



A key problem: Data interchange


Receiver needs to understand data encoding+representation


There is a huge number of formats today


This complicates data exchange enormously


Could be solved with some kind of
ü
ber
-
format (

XML)




17

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

Plan for today


Web services


Definition


Data interchange problem


Extensible Markup Language (XML)


Data model


Encoding data in XML


Namespaces


Well
-
formed and valid


DTDs and XML Schema; DOM

18

University of Pennsylvania

NEXT

© 2013 A. Haeberlen, Z. Ives

Extensible Markup Language (XML)


What is it?


A set of rules for encoding documents


A subset of SGML



Who uses it?


Document Object Model (DOM)
--

OO representation of XML


Simple API for XML (SAX)
--

event
-
driven parser for XML


Ant
--

Java's 'make' tool, whose 'Makefile uses XML


XPath, XQuery, XSL, XSLT


Web service standards (e.g., SOAP)


Anything Ajax (stay tuned)


Microsoft Office

19

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

Example XML document


Structure is very similar to HTML


This is not an accident
-

both are subsets of SGML

20

University of Pennsylvania

<?xml version="1.0" encoding="ISO
-
8859
-
1" ?>

<dblp>


<mastersthesis mdate="2002
-
01
-
03" key="ms/Brown92">



<author>Kurt P. Brown</author>



<title>PRPL: A Database Workload Specification Language</title>



<year>1992</year>



<school>Univ. of Wisconsin
-
Madison</school>



</mastersthesis>


<article mdate="2002
-
01
-
03" key="tr/dec/SRC1997
-
018">



<editor>Paul R. McJones</editor>



<title>The 1995 SQL Reunion</title>



<journal>Digital System Research Center Report</journal>



<volume>SRC1997
-
018</volume>



<year>1997</year>



<ee>db/labs/dec/SRC1997
-
018.html</ee>



<ee>http://www.mcjones.org/System_R/SQL_Reunion_95/</ee>



</article>


Processing

instruction

Element

Attribute

Open and

close tags

(case
-

sensitive)

© 2013 A. Haeberlen, Z. Ives

XML data model


To model an XML document, we need at least
the following ('XML information set'):


Document (root)


Element


Attribute


Processing instruction


Text (content)


Namespace


Comment


... and a few more obscure items

21

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

XML data model visualized

22

University of Pennsylvania

Root

?xml

dblp

mastersthesis

article

mdate

key

author

title

year

school

editor

title

year

journal

volume

ee

ee

mdate

key

2002…

ms/Brown92

Kurt P….

PRPL…

1992

Univ….

2002…

tr/dec/…

Paul R.


The


Digital…

SRC…

1997

db/labs/dec

http://www.

attribute

root

p
-
i

element

text

© 2013 A. Haeberlen, Z. Ives

A few common uses of XML


Serves as an extensible HTML


Allows custom tags (used, e.g., by MS Word, OpenOffice)


Supplement it with style sheets (XSL) to define formatting



Provides an exchange format for data


Tables, objects, ...


Still need to agree on terminology



Format for marshalled data in Web Services


Example: SOAP (see earlier lecture on AWS)

23

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

XML easily encodes relations

24

University of Pennsylvania

ID

Course

Grade

1

330
-
f13

B+

23

455
-
s13

A

<
student
-
course
-
grade
>


<
tuple
>


<
sid
>1</
sid
>


<
course
>330
-
f13</
course
>


<
grade
>B</
grade
>


</
tuple
>


<
tuple
>


<
sid
>23</
sid
>


<
course
>455
-
s13</
course
>


<
grade
>A</
grade
>


</
tuple
>

</
student
-
course
-
grade
>

"student
-
course
-
grade" relation

© 2013 A. Haeberlen, Z. Ives

XML also encodes objects


What do we do about the pointers?


Can be represented as IDs and indirection/references

25

University of Pennsylvania

Type: Programming

URL: cis455.com

Incorporates:

Members:

Type: Other

URL: cis330.com

Incorporates:
-

Members:

Joan

Jill

Frank

Steven

cis455

cis330

<
projects
>


<
project class
=“cis455”>


<
type
>Programming</
type
>


<
memberList
>


<
teamMember
>Joan</
teamMember
>


<
teamMember
>Jill</
teamMember
>


</
memberList
>


<
codeURL
>www….</c
odeURL
>


<
incProjectFrom

class
=“cis330”/>


</
project
>


<
project class
=“cis330”>


...


</
project
>

</
projects
>

© 2013 A. Haeberlen, Z. Ives

XML and code


Some web services use XML to encode
messages, e.g., for remote procedure calls


Example: SOAP + WSDL (remember AWS lecture)


Sender marshals parameters into XML



Pros and cons?


Easy to be forward compatible


Easy to read and validate (?)


At least lots of tools available


Generally compatible with firewalls


Drawback: XML is verbose and not an efficient encoding


But, when we are sending only 100s of bytes, does it really matter?

26

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

Handling name clashes


What if a tag is used by multiple sources?


Example: XML document with book reviews that includes
HTML for display formatting

27

University of Pennsylvania

<
html>


<head><title>Book Review</title></head>


<body>


<bookreview>




























</bookreview>


</body>

</html>

http://www.xml.com/pub/a/1999/01/namespaces.html

Confuses software that is

parsing this document

(book title or page title?)





<title>XML: A Primer</title>


<table>


<tr align="center">


<td>Author</td><td>Price</td>


<td>Pages</td><td>Date</td>


</tr>


<tr align="left">


<td><author>Simon St. Laurent</author></td>


<td><price>31.98</price></td>


<td><pages>352</pages></td>


<td><date>1998/01</date></td>


</tr>


</table>

`

© 2013 A. Haeberlen, Z. Ives

XML namespaces


Solution: XML namespaces


Part 1: Bind namespaces to URIs


Part 2: Qualified names

28

University of Pennsylvania

<html
xmlns="http://www.w3.org/HTML/1998/html4"
xmlns:xdc="http://www.xml.com/books"
>


<head><title>Book Review</title></head>


<:body>


<xdc:bookreview>


<
xdc:
title>XML: A Primer</
xdc:
title>


<table>


<tr align="center">


<td>Author</td><td>Price</td>


<td>Pages</td><td>Date</td>


</tr>


<tr align="left">


<td><xdc:author>Simon St. Laurent</xdc:author></td>


<td><xdc:price>31.98</xdc:price></td>


<td><xdc:pages>352</xdc:pages></td>


<td><xdc:date>1998/01</xdc:date></td>


</tr>


</table>


</xdc:bookreview>


</body>

</html>

http://www.xml.com/pub/a/1999/01/namespaces.html

Default namespace

(to avoid cluttering

the document)

Qualified

names

© 2013 A. Haeberlen, Z. Ives

XML is not enough on its own


Too unconstrained for many cases!


How will we know when we're getting garbage?


How will we query data in an XML document?


How will we understand the data we've got?


29

University of Pennsylvania

NEXT

© 2013 A. Haeberlen, Z. Ives

Well
-
formed and valid


How will we know whether document is ok?


Idea: Check whether begin and end tags are correctly
nested, special characters (<, &) are properly used, etc.


If this (and a few other conditions) hold, the document is
well
-
formed


But is the document valid, i.e., is the structure okay?


Need some form of specification for valid documents

30

University of Pennsylvania

<addresses>


<address id="1" ancestor="2">


<name>Andreas Haeberlen


<street>3330 Walnut St


</address>


<address id="2">


<name>Ben Franklin</street>


<street>834 Chestnut St</name>


</address>

</addresses>

<addresses>


<address id="1" ancestor="2">


<name>Andreas Haeberlen</name>


<street>3330 Walnut St</street>


</address>


<name id="2">


<name>Ben Franklin</name>


<street>834 Chestnut St</street>


</name>

</addresses>

Not well
-
formed

Well
-
formed, but not valid

© 2013 A. Haeberlen, Z. Ives

Recap: XML


Textual data format


Familiar from HTML: Tags, elements, attributes, ...


Can encode complex data structures: Relations, pointers, ...


Used as a data exchange format, for marshalling, ...


Namespaces to help with name clashes



Not enough on its own


Need a way to detect whether document is valid/well
-
formed


Need a way to query data in XML documents


Need a way to understand/work with the data



31

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

Plan for today


Web services


Extensible Markup Language (XML)


Data model


Encoding data in XML


Namespaces


Well
-
formed and valid


DTDs and XML Schema; DOM


Document Type Definitions (DTDs)


XML Schema


Document Object Model (DOM)


XPath


XSLT

32

University of Pennsylvania

NEXT

© 2013 A. Haeberlen, Z. Ives

Document Type Definitions (DTDs)


A DTD is an EBNF grammar that defines the
structure of an XML document


XML document references an associated DTD,

plus the root element


DTD specifies the children of the root, and so on


33

University of Pennsylvania

<!ELEMENT addresses ANY>

<!ELEMENT address (name, street)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT street (#PCDATA)>

<!ATTLIST address id ID #REQUIRED>

<!ATTLIST address ancestor IDREF #IMPLIED>

<?xml version="1.0" encoding="utf
-
8"?>

<!DOCTYPE addresses SYSTEM "my.dtd">

<addresses>


...

</addresses

my.dtd

addr.xml

© 2013 A. Haeberlen, Z. Ives

IDs and IDREFs


How can we tell that this document is invalid?


DTD attributes can be identifiers:


ID: Special attribute, analogous to keys for elements


IDREF: Reference to an ID


IDREFS: Space
-
delimited list of IDREFs


34

University of Pennsylvania

<!ELEMENT addresses ANY>

<!ELEMENT address (name, street)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT street (#PCDATA)>

<!ATTLIST address id ID #REQUIRED>

<!ATTLIST address ancestor IDREF #IMPLIED>

<?xml version="1.0" encoding="utf
-
8"?>

<!DOCTYPE addresses SYSTEM "my.dtd">
<addresses>


<address id="1" ancestor="3">


<name>Andreas Haeberlen</name>


<street>3330 Walnut St</street>


</address>


<address id="2">


<name>Ben Franklin</name>


<street>834 Chestnut St</street>


</address>

</addresses>

© 2013 A. Haeberlen, Z. Ives

Limitations of DTDs


DTDs capture the grammatical structure, but
have several drawbacks:


Attach a fixed meaning to an element, regardless of context


Difficult to express negation


Example: Footnote may contain arbitrary text but no other footnotes


No data types for element content


CDATA, #PCDATA


Global ID/reference space is inconvenient


No way to define OO
-
style inheritance


35

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

XML Schema: DTDs rethought


Features:


XML Syntax


Better way of defining keys using XPaths


Type subclassing


... and, of course, built
-
in datatypes

36

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

Example XML Schema

37

University of Pennsylvania

<?xml version="1.0" encoding="utf
-
8"?>

<xs:schema elementFormDefault="qualified"


xmlns:xs="http://www.w3.org/2001/XMLSchema">


<xs:element name="Address">


<xs:complexType>


<xs:sequence>


<xs:element name="Recipient" type="xs:string" />


<xs:element name="Street" type="xs:string" />


<xs:element name="Town" type="xs:string" />


<xs:element name="County" type="xs:string" minOccurs="0" />


<xs:element name="PostCode" type="xs:string" />


<xs:element name="POBox" type="xs:boolean" />


<xs:element name="Since" type="xs:date" />


<xs:element name="Country">


<xs:simpleType>


<xs:restriction base="xs:string">


<xs:enumeration value="FR" />


<xs:enumeration value="DE" />


<xs:enumeration value="UK" />


<xs:enumeration value="US" />


</xs:restriction>


</xs:simpleType>


</xs:element>


</xs:sequence>


</xs:complexType>


</xs:element>

</xs:schema>

http://en.wikipedia.org/wiki/XML_Schema_%28W3C%29

Actual data types

Structured type

Elements can have

minOccurs, maxOccurs

© 2013 A. Haeberlen, Z. Ives

Basic constructs of Schema


Separation of elements (and attributes)
from types:


complexType is a structured type, which can have
sequences or choices


element and attribute have name and type; elements
may also have minOccurs and maxOccurs



Subtyping, most commonly using



38

University of Pennsylvania

<complexContent>


<extension base="prevType">


...


</...>


© 2013 A. Haeberlen, Z. Ives

Designing an XML schema or DTD


Often we are given an existing DTD or schema


Example: HTML DTD



If not, we need to design one


What would be a good approach?


Idea: Orient the XML tree around the 'central' objects in the
application of interest

39

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

Manipulating XML documents


Typical tasks:


Restructure a XML document


Add/remove/modify elements


Retrieve certain elements that satisfy some constraint


Examples: All books, all addresses in New Hampshire



How do we do this in a program?


Need an interface that allows programs and scripts to
dynamically access and update the content, structure, and
style of documents


Solution: The
Document Object Model
(DOM)

40

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

The Document Object Model


Document components represented by objects


Objects have methods like getFirstChild(), getNextSibling()...




can be used to traverse the tree


Can also modify the tree, and thus alter the XML, via
insertAfter(), etc.

41

University of Pennsylvania

Root

?xml

dblp

mastersthesis

article

mdate

key

author

title

year

school

editor

title

year

journal

volume

ee

ee

mdate

key

2002…

ms/Brown92

Kurt P….

PRPL…

1992

Univ….

2002…

tr/dec/…

Paul R.


The


Digital…

SRC…

1997

db/labs/dec

http://www.

attribute

root

p
-
i

element

text

XML

parser

XML document

© 2013 A. Haeberlen, Z. Ives

Isn't there an easier way?


What if we want to find all the author nodes,
or all the title nodes that contain 'scalable'?


Coding this manually can be quite cumbersome
-

need to
traverse the entire tree, keep track of conditions, ...



Alternative: A
query language


Idea: Declaratively describe the nodes we're interested in,
and let a query engine do all the hard work


This can be done with
XPath


Example: //*[contains(title(),'scalable')]


May return a set of nodes (not just one)

42

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

Recap: DTDs, XML Schema, DOM


Document Type Definitions (DTDs)


An EBNF grammar that defines the structure of an XML doc.


Special support for IDs and references


Several limitations, e.g., no proper data types, no subtypes



XML Schema


More expressive than DTDs; itself an XML document



Document Object Model (DOM)


An interface for accessing/changing XML data from programs


Document components are represented by objects

43

University of Pennsylvania

© 2013 A. Haeberlen, Z. Ives

Remember JSON?


A lot of AJAX applications use JSON instead

of XML (remember Lecture #4?)


How do the two compare?


44

University of Pennsylvania

{


"firstName": "John",


"lastName": "Smith",


"age": 25,


"address": {


"streetAddress": "21 2nd Street",


"city": "New York",


"state": "NY",


"postalCode": 10021


},


"phoneNumber": [


{ "type": "home", "number": "212 555
-
1234" },


{ "type": "fax", "number": "646 555
-
4567" }


]

}

© 2013 A. Haeberlen, Z. Ives

Stay tuned

Next time you will learn about:

Ajax and GWT

45

University of Pennsylvania

http://www.flickr.com/photos/arun
-
venkataswamy/8184630763/sizes/c/in/photostream/