Hypertext Transfer Protocol

guideflannelServers

Dec 4, 2013 (3 years and 8 months ago)

99 views

SENG2220

Web Development II

Mohammed A. Saleh

http://ifm.ac.tz/staff/msaleh/SENG2220.ht
ml


6
th

November 2009

1

M
odule Content

The

HTTP

Protocol


-
HTTP

version

1
.
0
.

GET

and

POST

methods
.

Request

line
.

Status

line
.

Headers
.

Carrying

Data
.

Relationship

to

TCP
.

HTTP

version

1
.
1
.

Methods

available
.

Persistent

connections
.

Chunked

encoding
.

Mandatory

headers
.

Future

evolution

of

HTTP
.

Suitability

of

HTTP

as

transport

for

higher

protocols
.


2

Hypertext Transfer Protocol

HTTP



HTTP

is

a

simple

stateless

request
-
response

protocol



It's

the

network

protocol

used

to

deliver

virtually

all

files

and

other

data

(collectively

called

resources
)

on

the

World

Wide

Web



the

file

may

contain

static

data


HTML

pages,

GIFs,

JPEGs,

Microsoft

Word

documents,

Adobe

PDF

documents,

etc
.
,

etc
.



the

file

may

be

a

program

that

runs

on

the

server

to

output

data


ASP,

PHP,

Perl,

JSP,

etc
.
,

etc
.





T
akes

place

through

TCP/IP

sockets


A

web

client

(user

agent)

requests

a

resource

identified

by

a

uniform

resource

locator

(URL)


The

web

server

identified

in

the

URL

responds

with

the

file

identified

in

the

URL



3

Cont …


The

standard

(and

default)

port

for

HTTP

servers

to

listen

on

is

80

What

are

"Resources"?


A

resource

is

some

chunk

of

information

that

can

be

identified

by

a

URL

(it's

the

R

in

URL
)
.


The

most

common

kind

of

resource

is

a

file,

but

a

resource

may

also

be

a

dynamically
-
generated

query

result

or

the

output

of

a

CGI

script


HTTP/
1
.
0

highly

successful


HTTP/
1
.
1

introduced

to

address

flaws

in

1
.
0

and

improve

network

performance


pipelining

requests

and

responses



4

WWW Architecture


Platform: Win, Mac, Unix,


Web Server: Apache, IIS

Platform: Win, Mac, Unix

Browser: IE, Mozilla, Opera

Client

Server

Request:

http://www.ifm.ac.tz/about/

Response:

<html>…</html>

Network

HTTP over TCP/IP

WWW Architecture


Client
-
Server Request
-
Response architecture


You request a web page


e.g.
http://www.ifm.ac.tz/about/index.html


HTTP request



The web server responds with data


HTTP response


usually in the form of a web page (HTML document)


could be any file format


web page is written using HyperText Markup Language (HTML)



Web pages are identified by a Uniform Resource Locator (URL)


protocol: e.g.
http


web server: e.g.
www.ifm.ac.tz


[machine name].[domain name]


web page: e.g.
about/index.html


Structure of HTTP Transactions


HTTP

uses

the

client
-
server

model


An

HTTP

client

opens

a

connection

and

sends

a

request

message

to

an

HTTP

server


the

server

then

returns

a

response

message
,

usually

containing

the

resource

that

was

requested


After

delivering

the

response,

the

server

closes

the

connection



The

format

of

the

request

and

response

messages

are

similar,

and

English
-
oriented


an

initial

line,



zero

or

more

header

lines,



a

blank

line

(i
.
e
.

a

CRLF

by

itself),

and


an

optional

message

body

(e
.
g
.

a

file,

or

query

data,

or

query

output)
.


CR

and

LF

here

mean

ASCII

values

13

and

10
,

even

though

some

platforms

may

use

different

characters


7

Initial Request Line


The

initial

line

is

different

for

the

request

than

for

the

response


A

request

line

has

three

parts,

separated

by

spaces


a

method

name


the

local

path

of

the

requested

resource


the

version

of

HTTP

being

used


A

typical

request

line

is
:


GET

/path/to/file/index
.
html

HTTP/
1
.
1


Notes
:



GET

is

the

most

common

HTTP

method
;

it

says

"give

me

this

resource"
.

Other

methods

include

POST

and

HEAD
.

Method

names

are

always

uppercase


The

path

is

the

part

of

the

URL

after

the

host

name,

also

called

the

request

URI


The

HTTP

version

always

takes

the

form

"
HTTP/x
.
x
",

uppercase
.


8

9

GET /msaleh/index.html HTTP/1.1

Host: staff.ifm.ac.tz

Connection: close

Accept: text/xml,text/html,text/plain,image/png,*/*

Accept
-
Language: en
-
gb,en

User
-
Agent: Mozilla/4.0 (compatible;MSIE 6.0;Windows NT 5.0)

Accept
-
Charset: ISO
-
8859
-
1,utf
-
8;q=0.7,*

If
-
Modified
-
Since:
Mon, 18 Sep 2006 22:57:19 GMT

Referer: http://web
-
sniffer.net


HTTP Request

Method

File name

HTTP version

Data


none for GET

Blank line

Headers

Initial Response Line


The

initial

response

line,

called

the

status

line
,

also

has

three

parts

separated

by

spaces


the

HTTP

version,



a

response

status

code

that

gives

the

result

of

the

request,

and



an

English

reason

phrase

describing

the

status

code


Typical

status

lines

are
:



HTTP/
1
.
0

200

OK

or


HTTP/
1
.
0

404

Not

Found


Notes
:


The

HTTP

version

is

in

the

same

format

as

in

the

request

line,

"
HTTP/x
.
x
"
.



The

status

code

is

meant

to

be

computer
-
readable
;

the

reason

phrase

is

meant

to

be

human
-
readable,

and

may

vary
.


10

HTTP/1.0 200 OK

Date:
Thu, 21 Sep 2006 22:06:05 GMT

Server: Apache/1.3.33 (Unix) PHP/4.3.10

Connection: close

Content
-
Type: text/html

ETag: "5d150
-
141c
-
450f244f"

Last
-
Modified:
Mon, 18 Sep 2006 22:57:19 GMT

Content
-
Length: 5184



<?xml version="1.0" encoding="UTF
-
8"?>

<!DOCTYPE html PUBLIC "
-
//W3C//DTD XHTML 1.0 Strict

<html xmlns="http://www.w3.org/1999/xhtml">

...

</html>

HTTP Response

HTTP version

Status code

Reason phrase

Headers

Data

HTTP Server Status Codes

Code

Description

200

OK

201

Created

301

Moved Permanently

302

Moved Temporarily

400

Bad Request


not understood

401

Unauthorized

403

Forbidden


not authorized

404

Not Found

500

Internal Server Error

Headers Lines


Headers

are

name/value

pair

that

appear

on

both

the

request

and

response

lines


The

name

of

the

header

is

separated

from

the

value

by

a

single

colon


For

example,

this

line

in

a

request

message
:


User
-
Agent
:

Mozilla/
4
.
0

(compatible
;

MSIE

6
.
0
;

Windows

NT

5
.
1
)



provides

a

header

called

User
-
Agent

whose

value

is

Mozilla/
4
.
0

(compatible
;

MSIE

6
.
0
;

Windows

NT

5
.
1
)



The

purpose

of

this

particular

header

is

to

supply

the

web

server

with

information

about

the

type

of

browser

making

the

request


13

Request Headers


HTTP

clients

use

headers

in

the

request

message

to

identify

themselves

and

control

how

content

is

returned
.



Example

if

you

are

using

IE
:


Accept
:
*/*



This

header

indicates

that

the

browser

will

accept

all

types

of

content
.


Accept
-
Language
:

en
-
gb



The

browser

prefers

British

English

content
.


Accept
-
Encoding
:

gzip,

deflate



The

browser

can

handle

gzip

or

deflate

compressed

content


Connection

Keep
-
Alive



The

browser

is

requesting

the

use

of

persistent

TCP

connections
.


Referer
:

http
:
//www
.
httpwatch
.
com/httpgallery/headers/



This

is

supplied

by

the

browser

to

indicate

if

the

current

request

was

the

result

of

a

link

from

another

web

page



User
-
Agent
:

Mozilla/
4
.
0

(compatible
;

MSIE

6
.
0
;

Windows

NT

5
.
1
)



This

identifies

the

browser

is

Internet

Explorer

Version

6

running

on

Windows

XP
.


14

Response Headers


HTTP

servers

use

headers

in

the

response

message

to

specify

how

content

is

being

returned

and

how

it

should

be

handled


Example

if

using

IE
:


Cache
-
Control
:

no
-
cache



This

header

indicates

whether

the

resource

may

be

cached

by

the

browser

or

any

immediate

caches
.



value

no
-
cache

disables

all

caching



Content
-
Length
:

2748



This

header

contains

the

length

in

bytes

of

the

resource

(i
.
e
.

the

gif

image)

that

follows

the

headers
.



Content
-
Type
:

image/gif



The

content

is

in

GIF

format
.


Date
:

Wed,

4

Oct

2004

12
:
00
:
00

GMT



This

is

the

current

date

and

time

on

the

web

server
.


Expires
:

-
1



The

Expires

header

specifies

when

the

content

should

be

considered

to

be

out

of

date
.

The

value

-
1

indicates

that

the

content

expires

immediately

and

would

have

to

be

re
-
requested

before

being

displayed

again
.


15

Response Headers


Server: Microsoft
-
IIS/6.0


The web server is an IIS 6 web server.


X
-
AspNet
-
Version: 2.0.50727


The web server is running ASP.NET 2.0


X
-
Powered
-
By: ASP.NET


The web server is running ASP.NET.

16

HTTP Methods


HTTP

method

is

supplied

in

the

request

line

and

specifies

the

operation

that

the

client

has

requested
.


Two

methods

that

are

mostly

used

are

the

GET

and

POST


GET

for

queries

that

can

be

safely

repeated



POST

for

operations

that

may

have

side

effects

(e
.
g
.

ordering

a

book

from

an

on
-
line

store)
.

The

GET

Method


It

is

used

to

retrieve

information

from

a

specified

URI

and

is

assumed

to

be

a

safe,

repeatable

operation

by

browsers,

caches

and

other

HTTP

aware

components



Operations

have

no

side

effects

and

GET

requests

can

be

re
-
issued


For

example,

displaying

the

balance

of

a

bank

account

has

no

effect

on

the

account

and

can

be

safely

repeated



17

HTTP Methods


Most

browsers

will

allow

a

user

to

refresh

a

page

that

resulted

from

a

GET
,

without

displaying

any

kind

of

warning



Proxies

may

automatically

retry

GET

requests

if

they

encounter

a

temporary

network

connection

problem
.


GET

requests

is

that

they

can

only

supply

data

in

the

form

of

parameters

encoded

in

the

URI

(known

as

a

Query

String
)



[downside]


Cannot

be

unused

for

uploading

files

or

other

operations

that

require

large

amounts

of

data

to

be

sent

to

the

server
.

The

POST

Method


Used

for

operations

that

have

side

effects

and

cannot

be

safely

repeated



18

HTTP Methods


For

example,

transferring

money

from

one

bank

account

to

another

has

side

effects

and

should

not

be

repeated

without

explicit

approval

by

the

user


If

you

try

to

refresh

a

page

in

Internet

Explorer

that

resulted

from

a

POST
,

it

displays

the

following

message

to

warn

you

that

there

may

be

side

effects
:


19

HTTP Methods


The

POST

request

message

has

a

content

body

that

is

normally

used

to

send

parameters

and

data


The

IIS

server

returns

two

status

codes

in

its

response

for

a

POST

request


The

first

is

100

Continue

to

indicate

that

it

has

successfully

received

the

POST

request



T
he

second

is

200

OK

after

the

request

has

been

processed
.

20

HTTP Recap


HTTP

is

a

stateless

protocol


Each

HTTP

request

is

independent

of

previous

and

subsequent

requests


HTTP/
1
.
0

defaults

to

Connection
:

close


closes

the

channel

of

communication

immediately

after

a

response


Connection
:

keep
-
alive

was

introduced

to

enable

persistent

connections



no need to re
-
negotiate a connection for each request


a connection can be re
-
used for multiple requests


HTTP/1.1 defaults to
keep
-
alive

for efficiency


supports pipelining to allow multiple requests to be sent in one TCP packet


The stateless nature of HTTP has a
big impact

on
how web applications are designed



21

State Preservation


State preservation mechanisms come in three basic
variations:


Cookies


is a small piece of text stored on a user's computer by a
web browser
, it
stored locally
on your computer
s

hard disk drive


store a small amount of information
on the client


sent to the server at each HTTP request


session variables


a unique identifier is used to associate information stored
on the server
with
a particular client


passing data at each request
-
response cycle


store information
in the web page


appending data to a URL


hidden fields in HTML forms


22

Caching


Web

pages

often

contain

content

that

remains

unchanged

for

long

periods

of

time
.


For

example,

an

image

containing

a

company

logo

may

be

used

without

modification

for

many

years
.


It

is

wasteful

in

terms

of

bandwidth

and

round

trips

to

repeatedly

download

images

or

other

content

that

is

not

regularly

updated


HTTP

supports

caching

so

that

content

can

be

stored

locally

by

the

browser

and

reused

when

required


By

carefully

controlling

caching,

it

is

possible

to

reuse

static

content

and

prevent

the

storage

of

dynamic

data
.


Browser

caching

is

controlled

by

the

use

of

the

Cache
-
Control,

Last
-
Modified

and

Expires

response

headers



23

Caching


Preventing Caching


Servers set the
Cache
-
Control

response header to
no
-
cache
to
indicate that content should not be cached by the browser:


Cache
-
Control: no
-
cache


Allowing Caching


T
he Cache
-
Control header can be set to one of the following
values to allow caching:


<absen t>
:
If the
Cache
-
Control

header is not set, then any cache may
store the content.


Private:
The content is intended for use by a single user and should only be
cached locally in the browser.


Public: The content may be cached in public caches (e.g. shared proxies)




24

Caching


If the browser is to make effective use of cached content,
two extra pieces of information should be supplied.


modification date/time of the content
.
The server supplies this in
the
Last
-
Modified

response header:



Last
-
Modified: Wed, 15 Sep 2004 12:00:00 GMT


The second piece of information is the expiration date, that is
specified with the
Expires

header:


Expires: Sun, 17 Jan 2038 19:14:07 GMT


I
f a cached entry has a valid expiration date the browser
can reuse the content without having to contact the

server at all when a page or site is revisited


This greatly reduces the number of network round trips
for frequently visited pages



25

Caching


For

example,

the

Google

logo

is

set

to

expire

in

2038

and

will

only

be

downloaded

on

your

first

visit

to

google
.
com

26

Let us s
-
QUIZ our BRAINS

1.
What

do

the

following

acronyms

stand

for?

HTML,

HTTP,

TCP,

UDP,

IP,

FTP,

SMTP,

DNS

and

OSI

2.
How

many

layers

are

found

on

the

OSI

reference

model?

How

about

the

TCP/IP

protocol

stack?

List

them

3.
Why

is

HTTP

considered

to

be

a

stateless

protocol?

4.
Is

there

a

need

to

maintain

state

on

the

web?

What

is

a

cookie?

5.
Mention

the

two

main

HTTP

headers
.

How

do

they

differ?


27

Encoding


When

an

HTTP

client

is

reading

a

response

message

from

a

server

it

needs

to

know

when

it

has

reached

the

end

of

the

message
.


It

is

important

with

persistent

(keep

alive)

connections
,

because

a

connection

can

only

be

re
-
used

by

another

HTTP

transaction

after

the

response

message

has

been

fully

received


Three

ways

in

which

an

HTTP

server

can

indicate

the

end

of

the

response

message
:



28

Cont …

Connection

Closed

by

Server


T
he

connection

can

be

closed

at

the

end

of

the

response

message

by

the

server


P
revents

connections

being

re
-
used
.

Content
-
Length

Header


The

length

of

the

content

after

the

response

headers

can

be

specified

in

bytes

with

the

Content
-
Length

header

Chunked

Encoding


The

content

can

be

broken

up

into

a

number

of

chunks
;

each

of

which

is

prefixed

by

its

size

in

bytes


A

zero

size

chunk

indicates

the

end

of

the

response

message
.


29

Cont …


I
f

a

server

is

using

chunked

encoding

it

must

set

the

Transfer
-
Encoding

header

to

"chunked"
.


U
seful

when

a

large

amount

of

data

is

being

returned

to

the

client

and

the

total

size

of

the

response

may

not

be

known

until

the

request

has

been

fully

processed


An

example

of

this

is

generating

an

HTML

table

of

results

from

a

database

query


If

you

wanted

to

use

the

Content
-
Length

header

you

would

have

to

buffer

the

whole

result

set

before

calculating

the

total

content

size


with

chunked

encoding

you

could

just

write

the

data

one

row

at

a

time

and

write

a

zero

sized

chunk

when

the

end

of

the

query

was

reached
.



30

Key differences between HTTP/1.1
and HTTP /1.0


Version

numbers


The

version

number

in

an

HTTP

message

refers

to

the

hop
-
by
-
hop

sender

of

the

message,

not

the

end
-
to
-
end

sender


For

example,

if

an

HTTP/
1
.
1

origin

server

receives

a

message

forwarded

by

an

HTTP/
1
.
1

proxy,

it

cannot

tell

from

that

message

whether

the

ultimate

client

uses

HTTP/
1
.
0

or

HTTP/
1
.
1


HTTP/
1
.
1

defines

a

Via

header

that

describes

the

path

followed

by

a

forwarded

message


The

OPTIONS

method


HTTP/
1
.
1

introduces

the

OPTIONS

method


A

way

for

a

client

to

learn

about

the

capabilities

of

a

server

without

actually

requesting

a

resource

31

Cont …


Upgrading

to

other

protocols


T
o

ease

the

deployment

of

incompatible

future

protocols,

HTTP/
1
.
1

includes

the

new

Upgrade

request
-
header


A

client

can

inform

a

server

of

the

set

of

protocols

it

supports

as

an

alternate

means

of

communication


Caching



effective

because

a

few

resources

are

requested

often

by

many

users,

or

repeatedly

by

a

given

user


employed

in

most

Web

browsers

and

in

many

proxy

servers
;

occasionally

they

are

also

employed

in

conjunction

with

certain

origin

servers


eliminat
es

the

network

communication

with

the

origin

server


reduces

bandwidth

consumption,

by

avoiding

the

transmission

of

unnecessary

network

packets


can

reduce

the

load

on

origin

servers

32

Cont …


Caching

in

HTTP/

1
.
0


An

origin

server

may

mark

a

response,

using

the

Expires

header


a

cache

may

check

the

current

validity

of

a

response


Shortcomings
:

It

did

not

allow

either

origin

servers

or

clients

to

give

full

and

explicit

instructions

to

caches


Problems
:

incorrect

caching

of

some

responses

that

should

not

have

been

cached

and

failure

to

cache

some

responses

that

could

have

been

cached


Caching

in

HTTP/

1
.
1


provide

explicit

and

extensible

protocol

mechanisms

for

caching


a

cache

entry

is

fresh

until

it

reaches

its

expiration

time,

at

which

point

it

becomes

stale
.


A

cache

need

not

discard

a

stale

entry


but

it

normally

must

revalidate

it

with

the

origin

server

33

Cont …


Bandwidth

optimization


Network

bandwidth

is

almost

always

limited


Q
ueueing

delay

caused

by

congestion


W
asting

bandwidth

increases

latency


HTTP/
1
.
0

wastes

bandwidth

in

several

ways

that

HTTP/
1
.
1

addresses


A

typical

example

is

a

server's

sending

an

entire

(large)

resource

when

the

client

only

needs

a

small

part

of

it


There

was

no

way

in

HTTP/
1
.
0

to

request

partial

objects


I
t

is

possible

for

bandwidth

to

be

wasted

in

the

forward

direction


I
f

a

HTTP/
1
.
0

server

could

not

accept

large

requests,

it

would

return

an

error

code

after

bandwidth

had

already

been

consumed


What

was

missing
?

34

Cont …


What

was

missing

was

the

ability

to

negotiate

with

a

server

and

to

ensure

its

ability

to

handle

such

requests

before

sending

them


A

client

may

need

only

part

of

a

resource
,

may

want

to

display

just

the

beginning

of

a

long

document


HTTP/
1
.
1

range

requests

allow

a

client

to

request

portions

of

a

resource


Persistent

connections


HTTP/
1
.
0

made

no

provision

for

persistent

connections
.


U
se

a

Keep
-
Alive

to

request

that

a

connection

persist


This

design

did

not

interoperate

with

intermediate

proxies



HTTP/
1
.
1

makes

persistent

connections

the

default


HTTP/
1
.
1

clients,

servers,

and

proxies

assume

that

a

connection

will

be

kept

open

after

the

transmission

of

a

request

and

its

response

35

Cont …


Persistent

connections

may

be

cleanly

terminate
d

for

resource
-
management

reasons


Pipelining



HTTP/
1
.
1

encourages

the

transmission

of

multiple

requests

over

a

single

TCP

connection


each

request

must

still

be

sent

in

one

contiguous

message


and

a

server

must

send

responses

(on

a

given

connection)

in

the

order

that

it

received

the

corresponding

requests
.


However,

a

client

need

not

wait

to

receive

the

response

for

one

request

before

sending

another

request

on

the

same

connection


a

client

could

send

large

number

of

requests

over

a

TCP

connection

before

receiving

any

of

the

responses


This

practice,

known

as

pipelining
.

36

Questions