here - David Meredith's Web Site

errorhandleSoftware and s/w Development

Nov 18, 2013 (3 years and 8 months ago)

89 views

1

CIS336

Website design, implementation and
management

(also Semester 2 of CIS219, CIS221 and
IT226)

Lecture 7

HTTP and Web Programming in Java

(Based on Møller and Schwartzbach, 2006, Chapter 8)

David Meredith

d.meredith@gold.ac.uk


www.titanmusic.com/teaching/cis336
-
2006
-
7.html


2

The Internet and HTTP


HTTP:
Hypertext Transfer Protocol


a cornerstone of the infrastructure of the Web


prescribes how machines on the web exchange


HTML and XML documents


form field values


...


uses a
client
-
server model


communication follows a simple
request
-
response

pattern


client always initiates the interaction


client (e.g., browser)
requests

a resource by sending the
URL of the resource (e.g., HTML file) to a server


if server accepts request then it returns the resource


3

Network layers


Internet network protocols organised into a
number of layers


Network Interface Layer

is hardware used
to communicate bits from one physical
location to another (e.g., ethernet)

THE NETWORK INTERFACE LAYER

THE INTERNET LAYER

THE TRANSPORT LAYER

THE APPLICATION LAYER

OUR APPLICATIONS

IP

TCP, UDP

HTTP, FTP, SMTP, DNS

Ethernet

4

Internet Layer


Internet Layer

is that of the
Internet Protocol

(IP)


IP addresses


used to identify machines on the network


e.g., 158.223.1.118 is the IP address of the Department of Computing Web server
(www.doc.gold.ac.uk)


Internet Assigned Numbers Authority (IANA) manages allocation of IP addresses to
organizations


127.0.0.1 always refers to the current machine (also called
localhost
)


Datagram


packet of data of limited size


up to 65535 bytes, but only 1500 bytes on Ethernet network


IP defines how datagrams sent across the network


involves
routing

through intermediate machines


IP is an
unreliable

protocol


datagrams may be lost, arrive out of order or duplicated


THE NETWORK INTERFACE LAYER

THE INTERNET LAYER

THE TRANSPORT LAYER

THE APPLICATION LAYER

OUR APPLICATIONS

IP

TCP, UDP

HTTP, FTP, SMTP, DNS

Ethernet

5

Transport Layer


Transport layer contains
Transmission Control Protocol

(TCP)


transmits data in a
stream

of unbounded size


segments stream into IP datagrams and reassembles them at
destination


Reliable

protocol


retransmits lost datagrams


sorts datagrams into correct order when received


discards duplicate datagrams


Connection
-
oriented


connection

set up between two machines


data can be sent in both directions across connection (
full
-
duplex
)

THE NETWORK INTERFACE LAYER

THE INTERNET LAYER

THE TRANSPORT LAYER

THE APPLICATION LAYER

OUR APPLICATIONS

IP

TCP, UDP

HTTP, FTP, SMTP, DNS

Ethernet

6

Sockets and ports


End points of a TCP connection are called
sockets


Each socket is associated with a particular
port

on a particular
machine


Port is identified by an integer between 0 and 65535


allows single machine to have many simultaneous connections, each to
a different port


Ports 0
-
1023:
well
-
known ports


assigned to server applications executed by privileged processes (e.g., UNIX
root user), e.g.,


port 80 reserved for HTTP communication


ports 20 and 21 reserved for FTP servers


port 25 reserved for SMTP servers


port 443 reserved for HTTPS


Ports 1024
-
49151:
registered ports


allocated by IANA to avoid vendor conflicts


e.g., port 8080 reserved as alternative to 80 for running a web server using
ordinary user privileges


Ports 49152
-
65535:
dynamic
or

private ports


can be freely used by any client or server program


Browsers obtain ports for their TCP sockets arbitrarily among
unused non
-
well
-
known ports

7

User Datagram Protocol (UDP)


User datagram protocol

(UDP) is an
alternative to TCP in the transport
layer


UDP is unreliable and datagram
-
oriented


faster than TCP


can be used for voice and video where
speed is important and occasional losses
are acceptable


UDP provides foundation for the
domain name system (DNS)

8

IP is getting old


Specifications for TCP/IP are from
1981


original ideas from 1960s developed by
DARPA


Most internet traffic uses IPv4


more than 20 years old


shortage of IP addresses


even though allows for 4 billion


IPv6 solves IP address shortage

9

Application Layer


Application layer contains
applications

of the transport layer,
e.g.,


HTTP, FTP, SMTP, DNS


HTTP requests and responses transmitted using TCP


Two versions of HTTP:


HTTP/1.0


HTTP/1.1


becoming more prevalent


provides better support for caching, bandwidth optimization, error
notification, security and content negotiation

THE NETWORK INTERFACE LAYER

THE INTERNET LAYER

THE TRANSPORT LAYER

THE APPLICATION LAYER

OUR APPLICATIONS

IP

TCP, UDP

HTTP, FTP, SMTP, DNS

Ethernet

10

Domain Name System (DNS)


Defines structure of
domain names


Defines services governing association of IP
addresses with domain names


e.g., association of
82.165.120.54 with
www.titanmusic.com


Benefits of DNS


can move services from one machine to another without
changing domain name


single domain name can be associated with many IP
addresses


allows replication of servers


decreases workload


improves fault tolerance


many domain names can be associated with a single IP
address


virtual hosting


domain names are easier to remember than IP addresses

11

URIs


URI identifies network resource and has the
general form


http://<host>:<port>/<path>?<query>


e.g.

http://www.google.com/search?q=An+Introduction+to+XML+and+Web+Technologies


scheme is
http


host is
www.google.com

which is a domain name that has been registered
using DNS as being associated with one or more IP addresses


no port specified (port 80 is the default for http)


host and port identify web server program to be used to process request


path is
search


path typically identifies file in server's file system or program that can generate
appropriate response


query here is
q=An+Introduction+to+XML+and+Web+Technologies


contains arguments to program that processes request


URI may also contain fragment identifier that accesses a particular part
(fragment) of a resource


prefixed by # symbol


12

Requests


HTTP request sent from client to server using TCP


Entering the address



http://www.google.com/search?q=An+Introduction+to+XML+and+Web+Technologies

in a web browser causes


TCP connection to be established with


the IP address associated by DNS with www.google.com


port 80 (default value)


message such as one above to be sent from browser to server


Line 1 is a
request line


here, uses GET method to ask the server to send the resource



/search?q=An+Introduction+to+XML+and+Web+Technologies

using HTTP/1.1


Remaining lines are
header lines,
each with the form,


field: value


HTTP/1.1 supports larger set of header fields than HTTP/1.0

13

Request header fields


Host


contains domain name and port number (if not
omitted) of server that receives request


optional in HTTP/1.0, mandatory in HTTP/1.1


User
-
Agent


contains information about the
user agent

(e.g.,
browser) that sends the request


allows response to be tailored for use in the client software


Referer


allows client to specify URI of resource from which
URI in request was obtained


e.g., if HTML page contains an img link, then request for
image will contain Referer field set to URI of HTML page

14

Accept header field


Specifies media types that are acceptable as a response to the request


also called
MIME types
(Multipurpose Internet Mail Extensions)


now used for much more than e
-
mail


Common media type are


text/plain
-

plain, unformatted text


text/html
-

HTML documents (not XHTML)


text/xml
-

XML documents


application/xml
-

for XML documents intended for application use, not human
-
readable XML (not clearly demarcated from
text/xml)


application/xhtml+xml
-

recommended for use with XHTML


multipart/form
-
data
-

HTML
-
like form field values


application/octet
-
stream
-

arbitrary binary data and data that doesn't fit into other categories


image/jpeg
-

JPEG image


Long list of media types maintained by Internet Assigned Numbers Authority (IANA)


*/* means all media fields


Quality parameter:
mime
-
type
;q=
value


value
between 0 and 1 (default)


indicates that
mime
-
type

is only acceptable if the quality of other mime types with higher q values is less than
value
times the
quality of the
mime
-
type

format resource

15

Other request header fields


Accept
-
Language


defines acceptability of natural languages


Accept
-
Encoding


specifies accepted
content codings


usually compression techniques


Accept
-
Charset


specifies accepted character sets


All can use q parameters

16

Responses


Response from server
sent using same TCP
connection as request


Response consists of


header

(lines 1
-
10 at left)


begins with
status line

indicating overall result
of attempt to satisfy
request


followed by header lines


body

(lines 12
-
24 at left)


contains requested
resource if request was
successful


Response at left
returned when request
URI is


http://www.brics.dk/index.html

17

Response status line


Status line (line 1 at left)
tells us that


response uses HTTP/1.1


status code for request is
200 OK


means request succeeded
and resource follows header


Five classes of status codes:


1xx indicates provisional,
informational response


2xx indicates success


e.g., 200 OK


3xx indicates redirection


e.g., 301 Moved
Permanently


4xx indicates client error


e.g., 404 Not found


5xx indicates server error


e.g., 500 Internal Server
Error

18

HTTP Response header lines


Date

shows date and time when
response sent


Server

contains information
about the server software


ETag

used for cache
management


usually digest of file size and last
modification time


Content
-
Length

gives size of
body in bytes


Content
-
Type

gives mime type
of resource in body


Content
-
Encoding

indicates
whether resource has been
compressed (e.g., with gzip)


Transfer
-
Encoding
, if present,
usually has value
chunked
,
indicating that resource is being
delivered in chunks


Location

used with status codes
301 and 307 to give new location
of resource

19


When GO! button pressed, form
field values sent to server as list
of name
-
value pairs, encoded into
a
query string

according to media
type chosen using enctype
attribute in form element


Default media type is

application/x
-
www
-
form
-
urlencoded

(
URL encoding
) which would
produce following:

bet=someone+else&email=toot%
40pop.com&send=GO%21


Fields listed in order of
appearance in source

& separates fields

= separates name from value

+ replaces each space

non
-
alphanumeric characters escaped

line breaks encoded as %0d%0a

HTML Forms

20

get and post methods in an HTML form


If form method is
get
, then query string is
appended to action URL:


http://www.brics.dk/ixwt/echo?
bet=someone+el
se&email=toot%40pop.com&send=GO%21


Request line in HTTP request will therefore be

GET /ixwt/echo?
bet=someone+else&email=toot%40pop.com&send=GO%21 HTTP 1.1


If form method is
post
, then query string is
placed in body of HTTP request which might
then be as above


as in response, body of request separated by
empty line from header

21

The difference between get and post


GET requests


mainly for retrieving data


safe

to the client


client not responsible for any side
-
effects on server


idempotent

-

i.e., side effects of two or more identical requests are
same as for one


generated by clicking on an HTML link


limited by maximum URL length imposed by browsers


only possible media type is application/x
-
www
-
urlencoded


POST request


is for operations that have side
-
effects on the server


user usually responsible for any side effects on server


not necessarily idempotent


clicking "reload" on a page that results from a POST request causes browser to
warn that this might repeat the action the form has carried out


not limited by maximum URL length imposed by browsers


used for sensitive information (e.g. passwords) because servers usually
log request URIs but not request bodies

22

Web programming with Java


Java highly suitable for web (and XML)
programming because


it is platform independent


it has a safe runtime model


array bound checks, automatic garbage collection,
bytecode verification, etc.


supports multi
-
threading and concurrency


useful for servers and clients


supports Unicode


comes with a suite of powerful libraries for
network programming


Only other language that competes with it
for web programming is C#

23

TCP/IP in Java


Accessing TCP/IP in Java usually
requires


java.net.InetAddress


represents an IP address


can do DNS look
-
ups


java.net.Socket


represents a TCP socket


java.net.ServerSocket


represents a
server socket

which is capable
of waiting for requests from clients

24

Performing DNS look
-
up


Above program takes a single argument which should be a domain
name


In line 7, getAllByName method used to produce an array of
InetAddresses which contains the IP addresses associated with the
domain name


In line 9, getHostAddress method used to get IP address from
each InetAddress object in array a and print it out


getAllByName method may throw an UnknownHostException

25

Finding the domain name and IP
address of current machine


Uses getLocalHost method in line 6 to construct an InetAddress
object containing information about the name and IP address of
the current machine on which the program is being executed


Use getHostName and getHostAddress in lines 7 and 8 to get the
name and IP address of the current machine and print them out


getLocalHost method may throw an UnknownHostException

26

Making a TCP connection between a
server and a client: The server


New ServerSocket created on line 7


Starts infinite loop in line 8, on each iteration of which,


uses accept method in line 9 to get ss to listen for a connection to be made on the port given on the command line, then acce
pts

it and creates a new socket, con, to represent the connection


constructs an InputStreamReader, in, to read bytes from the input stream of con (line 10) and convert them to characters


reads input using in, terminated with a 0 byte (lines 11
-
14) and stores in msg


attaches PrintWriter object, out, to the output stream of con and prints "Simon says: " plus the message in msg on this strea
m
(lines 15
-
17)


closes the connection con (line 18)


accept method may throw an IOException

27

Making a TCP connection between a
server and a client: The client


Establishes a connection with the SimpleServer by giving its IP address and port as command line arguments


The third command line argument is a message to send to the server


Attaches a PrintWriter to the output stream associated with the connection (line 8)


Prints the message given as an argument to the program to this output stream and terminates the message with
a zero byte


The read method (line 14) returns
-
1 when end of stream is reached


Then associates an InputStreamReader with the input stream associated with the connection and receives the
message sent by the server


Finally closes the connection (line 17)


getOutputStream method may throw an IOException

28

HTTP in Java

(The hard way)


Manually implements HTTP
support on top of TCP/IP


Sends request to Google
and extracts the result


Manually constructs an
HTTP request (lines 8
-
11,
15
-
17) using fact that
Google's "I'm Feeling
Lucky" feature accepts GET
requests of a particular
format


Parses response using fact
that response always
contains a Location header
line


First constructs a Socket and establishes a connection with Google server on port 80 (line 7)


Constructs a query string in the right format for the "I'm Feeling Lucky" feature (lines 8
-
11)


Writes the request header to an output stream attached to the socket (lines 12
-
18)


Reads response a line at a time until finds a header line starting with "Location:" (while loop starting in line
24)


Prints the URL value of this header line to standard output (line 26)


Closes connection (line 35)

29


HttpURLConnection class makes it easier to create HTTP requests and parse
responses


Above program does same as previous one but uses HttpURLConnection object to
create a connection


First construct a URL object (line 13) then use its openConnection method to create a
URLConnection


URLConnection is an abstract class but when URL's scheme is http, openConnection
creates an HttpURLConnection


return value of openConnection should therefore be coerced to the correct class


Read http://www.google.com/terms_of_service.html before running this program!

HTTP in Java

(The easier way)

30

Methods in HttpURLConnection


setRequestMethod


sets request method (usually GET or
POST)


setRequestProperty


sets a field:value pair in a header line
in the request


setDoInput


should be set to true (default) if
intend to read input from connection


setDoOutput


set to true (false by default) if intend
to write output to connection


connect


establishes TCP connection


usually not necessary since
connection attempted at first write


getOutputStream


gives output stream for request body
of POST requests


getResponseCode


returns response code (e.g., 200 for
OK)


getHeaderField


returns field from response header


getInputStream


gives input stream for reading
response body


Note that request header lines are called
properties

in HttpURLConnection


In HttpURLConnection, http redirects are
followed by default


can be disabled using
setInstanceFollowRedirects(false)


see line 15 above

31

A simple Web server in Java


Takes two command line
arguments


a port


the root directory for files to be
served


Then instantiates the class
FileServer and starts it (lines
26
-
27)

32

A simple Web server in Java


run method creates a
ServerSocket


Starts infinite loop of
processing requests


Only reads first line of each
request (lines 45
-
6)

33

A simple Web
server in Java


processRequest parses reqest
line


First makes sure request is
well
-
formed (lines 63
-
9)


Then ensures that URL does
not contain "/." or end with a
"~" (lines 72
-
5)


Then checks that if the file is a
directory then it ends with a
'/' and sends a "Moved
Permanently" message back
to the browser (which
typically resends the request
with the new URL (lines 77
-
84)


If requested file is a directory,
then path of returned file set
to the file index.html in the
directory (lines 86
-
8)


Attaches input stream to
requested file (line 91)


Guesses content type of file
(lines 92
-
3)


Prints out the response on the
output print stream (lines 94
-
99)


Logs interaction (line 100)


34

A simple Web server in Java


log method
prints out
record of each
interaction


errorReport
returns an
HTML Error
page to the
client browser


sendFile sends
the file as the
body of an
HTTP response
as a sequence
of bytes