The Ruby Way, 2 nd edition

cabbagepatchtapeInternet και Εφαρμογές Web

5 Φεβ 2013 (πριν από 4 χρόνια και 7 μήνες)

108 εμφανίσεις

CIT 383: Administrative Scripting

Slide #
1

CIT 383: Administrative Scripting

XML

CIT 383: Administrative Scripting

Topics

1.
What is XML?

2.
XML Structure

3.
REXML


CIT 383: Administrative Scripting

eXtensible Markup Language

Extensible descriptive markup language framework


Began as subset of Standard Generalized Markup
Language (SGML).


To ensure that data remains available after programs that
originally created/read it become obsolete or unusable.

<?xml version="1.0" encoding="UTF
-
8"?>

<inventory>


<book isbn=“0976694042”>




<author>Chris Pine</author>




<title>Learn to Program</title>


</book>

</inventory>

CIT 383: Administrative Scripting

Descriptive vs Presentational

Presentational describe how documents should look

<b>text</b> turns on boldface for text

What if you want to change book titles from bold to italics?

Replace won’t work if items other than books are bold.

Descriptive languages focus on the meaning

<title>xml and you</title>

Stylesheets describe how to present logical items.

Can just be used for data storage, interchange.

A/K/A logical or structural markup languages.

CIT 383: Administrative Scripting

XML
-
based Languages


Ant


Atom


CML


MathML


MML


MusicXML


ODF


OPML


RDF


SAML


SOAP


SVG


VoiceXML


WML


XHTML


XUL

CIT 383: Administrative Scripting

Evolution of XML

1986 SGML standard published as ISO 8879

1987 Unicode proposal published

1991 First volume of Unicode standard

1996 XML work started

1998 XML 1.0 released as a W3C standard

2001 XML Schema language

2004 XML 1.1 released (not widely used)

2007 Unicode 5.0 published

CIT 383: Administrative Scripting

XML Tree Structure

<todo>


<title>


Monday’s List


</title>


<item>


Study for midterm


</item>


<item>


<priority=10/>


Scripting Class


</item>


<item>


Bathe cat


</item>

</html>

todo

title

Tuesday’s List

item

Scripting Class

item

Bathe Cat

item

Study for midterm

priority

10

CIT 383: Administrative Scripting

Elements and Attributes

An element consists of tags and contents

<title>Learn to Program</title>

Begin and end tags are mandatory.

<isbn number=“0976694042”
/>

Attributes

number=“0976694042”

Elements may have zero or more attributes.

Attribute values must always be quoted.

CIT 383: Administrative Scripting

Text

XML declaration specifies character encoding

<?xml version="1.0" encoding="UTF
-
8"?>

Encodings

Unicode
: universal character set, UTF
-
8, UTF
-
32

ISO
-
8859
: 8
-
bit encodings, 8859
-
1 is West Europe

Entities

&#nnnn;

encodes specified Unicode character

&name;

are named character entities, such as

&lt; is <

&gt; is >

&amp; is &

currency symbols, fractions, Greek letters, math symbols, etc.

CIT 383: Administrative Scripting

XML Syntax Rules

1.
There is one and only one root tag.

2.
Begin tags must be matched by an end tag.

3.
XML tags must be properly nested.

4.
XML tags are case sensitive.

5.
All attribute values must be quoted.

6.
Whitespace within tags is part of text.

7.
Newlines are always stored as LF.

8.
HTML
-
style comments: <!
--

comment
--
>

CIT 383: Administrative Scripting

Correctness

Well
-
formed


Conforms to XML syntax rules.


A
conforming

parser will not parse documents
that are not well
-
formed.

Valid


Conforms to XML semantics rules as defined in


Document Type Definition (DTD)


XML Schema


A
validating

parser will not parse invalid
documents.

CIT 383: Administrative Scripting

XML Schema Languages

Document Type Definitions



Inherited from SGML.

No support for all XML.

XML Schema

Most commonly used.

Schemas are XML docs.

A/K/A WXS, XSD

RELAX NG

REgular LAnguage for

XML Next Generation

XML and non
-
XML forms.

<?xml version="1.0" encoding="utf
-
8" ?>

<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">


<xs:element name="Address">


<xs:complexType>


<xs:sequence>



<xs:element name="Recipient" type="xs:string" />



<xs:element name="House" type="xs:string" />



<xs:element name="Street" type="xs:string" />



<xs:element name="Town" type="xs:string" />



<xs:element minOccurs="0" name="County" type="xs:string" />



<xs:element name="PostCode" type="xs:string" />



<xs:element name="Country">



<xs:simpleType>



<xs:restriction base="xs:string">



<xs:enumeration value="FR" /> <xs:enumeration value="DE" /> <xs:enumeration value="ES" /> <xs:enumeration value="UK" /> <xs:e
n
umeration value="US" />



</xs:restriction>



</xs:simpleType>



</xs:element>


</xs:sequence>


</xs:complexType>


</xs:element>

</xs:schema>

CIT 383: Administrative Scripting

Ruby XML Parsers

REXML: Ruby Electric XML


Standard with the ruby language.


Slow on large documents.

libxml
-
ruby


Ruby bindings for Gnome libxml2 XML toolkit.


Very fast (30X as fast as REXML).

HPricot


Parses XML as well as HTML.


Fast (3
-
4X as fast as REXML).


Does not check for well
-
formedness or validity.

CIT 383: Administrative Scripting

Types of Parsing

Tree Parsing (DOM
-
like)


Good for small documents.


Loads entire document into memory.


Simple API

Stream Parsing (SAX
-
like)


Good for large documents.


User defines callback methods, passes to API.


Parser runs callback methods on pattern match.

CIT 383: Administrative Scripting

Tree Parsing

Loads entire XML doc into memory.

require ‘rexml/document’

include REXML

input = File.new(‘data.xml’)

doc = Document.new(input)

root = doc.root

Search document as a tree using XPath

doc.elements.each(“ch/section”) do |e|



puts e.attributes[“title”]

end

CIT 383: Administrative Scripting

Stream Parsing

Define listener class.

class MyListener


include REXML::StreamListener


def tag_start(*args)


puts “start: #{args.map {|x|
x.inspect}.join(‘,’”


end

end

Invoke parser

require ‘rexml/document’

require ‘rexml/streamlistener’

include REXML

listen = MyListener.new

source = File.new(‘data.xml’)

Document.parse_stream(source, listen)

CIT 383: Administrative Scripting

XPath Searches

h.search("p")

Find all paragraph tags in document.


doc.search("/html/body//p")

Find all paragraph tags within the body tag.

doc.search("//a[@src]")

Find all anchor tags with a src attribute.

doc.search("//a[@src='google.com']")

Find all a tags with a src attribute of google.com.

CIT 383: Administrative Scripting

Slide #
18

References

1.
Michael Fitzgerald,
Learning Ruby
, O’Reilly,
2008.

2.
David Flanagan and Yukihiro Matsumoto,
The
Ruby Programming Language
, O’Reilly, 2008.

3.
Hal Fulton,
The Ruby Way, 2
nd

edition
, Addison
-
Wesley, 2007.

4.
Robert C. Martin,
Clean Code
, Prentice Hall,
2008.

5.
Dave Thomas with Chad Fowler and Andy Hunt,
Programming Ruby, 2
nd

edition
, Pragmatic
Programmers, 2005.