Servers: Concurrency and Performance

judgedrunkshipΔιακομιστές

17 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

96 εμφανίσεις

Servers: Concurrency and
Performance

Jeff Chase

Duke University


HTTP Server


HTTP Server


Creates a socket (
socket
)


Bind
s to an address


Listen
s to setup accept backlog


Can call
accept

to block waiting for connections


(Can call
select
to check for data on multiple socks
)


Handle request


GET /index.html HTTP/1.0
\
n

<optional body, multiple lines>
\
n

\
n

Inside your server

packet
queues

listen
queue

accept
queue

Server application

(Apache,
Tomcat/Java, etc)

Measures

offered load

response time

throughput

utilization

Example: Video On Demand

Client() {


fd = connect(“server”);


write (fd, “video.mpg”);


while (!eof(fd))

{



read (fd, buf);



display (buf);


}

}

Server() {


while (1) {



cfd = accept();



read (cfd, name);



fd = open (name);



while (!eof(fd)) {



read(fd, block);



write (cfd, block);



}


close (cfd); close (fd);

}









[MIT/Morris]

How many clients can the server support?

Suppose, say, 200 kb/s video on a 100 Mb/s network link?

Performance “analysis”


Server capacity:


Network (100 Mbit/s)


Disk (20 Mbyte/s)


Obtained performance: one client stream


Server is limited by software structure


If a video is 200 Kbit/s, server should be able to
support more than one client.



[MIT/Morris]

500?

WebServer Flow

TCP socket space

state: listening

address: {*.
6789
,
*
.*
}

completed connection queue:

sendbuf:

recvbuf:

128.36.2
3
2.
5

128.36.2
30
.2

state: listening

address: {*.
2
5
,
*
.*
}

completed connection queue:

sendbuf:

recvbuf:

state:
established

address: {128.36.2
3
2.
5:
6789
, 198.69.10.10.
1500
}

sendbuf:

recvbuf:

connSocket = accept()

Create ServerSocket

read request from
connSocket

read

local file

write file to
connSocket

close connSocket

Discussion: what does each step do and
how long does it take?

Web Server Processing Steps

Accept Client

Connection

Read HTTP

Request Header

Find

File

Send HTTP

Response Header

Read File

Send Data

may block

waiting on

disk I/O

Want to be able to process requests concurrently.

may block

waiting on

network

Process States and Transitions

running

(user)

running

(kernel)

ready

blocked

Run

Wakeup

interrupt,

exception

Sleep

Yield

trap/return

Server Blocking


accept()

when no connect requests are waiting on the
listen queue


What if server has multiple ports to listen from?


E.g., 80 for HTTP, 443 for HTTPS


open/read/write

on server files


read()

on a socket, if the client is sending too slowly


write()

on socket, if the client is receiving too slowly


Yup, TCP has
flow control

like pipes


What if the server blocks while serving one client, and
another client has work to do?



Under the Hood

CPU

I/O device

I/O request

I/O completion

start (arrival rate
λ
)



exit

(throughput
λ
until some

center saturates)


Concurrency and Pipelining

CPU

DISK

Before

NET

CPU

DISK

NET

After


Better single
-
server
performance


Goal: run at server’s hardware speed


Disk or network should be bottleneck


Method:


Pipeline blocks of each request


Multiplex requests from multiple clients


Two implementation approaches:


Multithreaded server


Asynchronous I/O

[MIT/Morris]

Concurrent threads or processes


Using multiple threads/processes


so that only the flow processing
a particular request is blocked


Java: extends Thread or
implements Runnable interface

Example: a Multi
-
threaded WebServer,
which creates a thread for each request

Multiple Process Architecture


Advantages


Simple programming while addressing blocking issue


Disadvantages


Many processes; large context switch overheads


Consumes much memory


Optimizations involving sharing information among processes
(e.g., caching) harder

Accept

Conn

Read

Request

Find

File

Send

Header

Read File

Send Data

Accept

Conn

Read

Request

Find

File

Send

Header

Read File

Send Data

Process 1

Process N



separate address spaces

Using Threads


Advantages


Lower context switch overheads


Shared address space simplifies optimizations (e.g., caches)


Disadvantages


Need kernel level threads (why?)


Some extra memory needed to support multiple stacks


Need thread
-
safe programs, synchronization

Accept

Conn

Read

Request

Find

File

Send

Header

Read File

Send Data

Accept

Conn

Read

Request

Find

File

Send

Header

Read File

Send Data

Thread 1

Thread N



Multithreaded server

server() {


while (1) {



cfd = accept();



read (cfd, name);



fd = open (name);



while (!eof(fd)) {



read(fd, block);



write (cfd, block);



}


close (cfd); close (fd);

}}



for (i = 0; i < 10; i++)

threadfork (server);


When waiting for I/O,
thread scheduler runs
another thread


What about references to
shared data?


Synchronization


[MIT/Morris]

Event
-
Driven Programming


One execution stream: no CPU
concurrency.


Register interest in events
(callbacks).


Event loop waits for events,
invokes handlers.


No preemption of event
handlers.


Handlers generally short
-
lived.

Event

Loop

Event Handlers

[Ousterhout 1995]

Single Process Event Driven (SPED)


Single threaded


Asynchronous (non
-
blocking) I/O


Advantages


Single address space


No synchronization


Disadvantages


In practice, disk reads still block

Accept

Conn

Read

Request

Find

File

Send

Header

Read File

Send Data

Event Dispatcher

Asynchronous Multi
-
Process Event Driven (AMPED)


Like SPED, but use helper processes/thread for disk I/O


Use IPC to communicate with helper process


Advantages


Shared address space for most web server functions


Concurrency for disk I/O


Disadvantages


IPC between main thread and helper threads

Accept

Conn

Read

Request

Find

File

Send

Header

Read File

Send Data

Event Dispatcher

Helper 1

Helper 1

Helper 1

This hybrid model is used by the “Flash” web server.


Event
-
Based Concurrent
Servers Using I/O Multiplexing


Maintain a pool of connected descriptors.


Repeat the following forever:


Use the Unix
select f
unction to block until:


(a) New connection request arrives on the listening
descriptor.


(b) New data arrives on an existing connected descriptor.


If (a), add the new connection to the pool of connections.


If (b), read any available data from the connection


Close connection on EOF and remove it from the pool.



[CMU 15
-
213]

Select


If a server has many open sockets, how does it know
when one of them is ready for I/O?

int select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
struct timeval *timeout);



Issues with scalability: alternative event interfaces
have been offered.

Asynchronous I/O

struct callback {


bool (*is_ready)();


void (*cb)(arg);


void *arg;

}

main() {


while (1) {


for (c = each callback) {



if (c
-
>is_ready())



c
-
>handler(c
-
>arg);


}


}

}





Code is structured as a
collection of handlers



Handlers are nonblocking



Create new handlers for
blocking operations



When operation
completes, call handler

[MIT/Morris]

Asychronous server

init() {


on_accept(accept_cb);

}

accept_cb() {


on_readable(cfd,name_cb);

}

on_readable(fd, fn) {


c = new


callback(test_readable, fn, fd);


add c to callback list;

}

name_cb(cfd) {


read(cfd,name);


fd = open(name);


on_readable(fd, read_cb);

}

read_cb(cfd, fd) {


read(fd, block);


on_writeeable(fd, write_cb);

}

write_cb(cfd, fd) {


write(cfd, block);


on_readable(fd, read_cb);

}

[MIT/Morris]

Multithreaded vs. Async


Hard to program


Locking code


Need to know what blocks


Coordination explicit


State stored on thread’s stack


Memory allocation implicit


Context switch may be
expensive


Multiprocessors



Hard to program


Callback code


Need to know what blocks


Coordination implicit


State passed around explicitly


Memory allocation explicit


Lightweight context switch


Uniprocessors

[MIT/Morris]

Coordination example


Threaded server:


Thread for network
interface


Interrupt wakes up
network thread


Protected (locks and
conditional variables)
shared buffer shared
between server threads
and network thread



Asynchronous I/O


Poll for packets


How often to poll?


Or, interrupt generates
an event


Be careful: disable
interrupts when
manipulating callback
queue.

[MIT/Morris]

Threads!

One View

Should You Abandon Threads?


No:

important for high
-
end servers (e.g.
databases).


But, avoid threads wherever possible:


Use events, not threads, for GUIs,

distributed systems, low
-
end servers.


Only use threads where true CPU

concurrency is needed.


Where threads needed, isolate usage

in threaded application kernel: keep

most of code single
-
threaded.

Threaded Kernel

Event
-
Driven Handlers

[Ousterhout 1995]

Another view


Events obscure control flow


For programmers
and
tools

Threads

Events

thread_main(int sock) {


struct session s;


accept_conn(sock, &s);


read_request(&s);


pin_cache(&s);


write_response(&s);


unpin(&s);

}


pin_cache(struct session *s) {


pin(&s);


if( !in_cache(&s) )


read_file(&s);

}

AcceptHandler(event e) {


struct session *s = new_session(e);


RequestHandler.enqueue(s);

}

RequestHandler(struct session *s) {


…; CacheHandler.enqueue(s);

}

CacheHandler(struct session *s) {


pin(s);


if( !in_cache(s) ) ReadFileHandler.enqueue(s);


else ResponseHandler.enqueue(s);

}

. . .

ExitHandlerr(struct session *s) {


…; unpin(&s); free_session(s); }

Accept

Conn.

Write

Response

Read

File

Read

Request

Pin

Cache

Web Server

Exit

[von Behren]

Control Flow

Accept

Conn.

Write

Response

Read

File

Read

Request

Pin

Cache

Web Server

Exit

Threads

Events

thread_main(int sock) {


struct session s;


accept_conn(sock, &s);


read_request(&s);


pin_cache(&s);


write_response(&s);


unpin(&s);

}


pin_cache(struct session *s) {


pin(&s);


if( !in_cache(&s) )


read_file(&s);

}

CacheHandler(struct session *s) {


pin(s);


if( !in_cache(s) ) ReadFileHandler.enqueue(s);


else ResponseHandler.enqueue(s);

}

RequestHandler(struct session *s) {


…; CacheHandler.enqueue(s);

}

. . .

ExitHandlerr(struct session *s) {


…; unpin(&s); free_session(s);

}

AcceptHandler(event e) {


struct session *s = new_session(e);


RequestHandler.enqueue(s); }


Events obscure control flow


For programmers
and
tools

[von Behren]

Exceptions


Exceptions complicate control flow


Harder to understand program flow


Cause bugs in cleanup code

Accept

Conn.

Write

Response

Read

File

Read

Request

Pin

Cache

Web Server

Exit

Threads

Events

thread_main(int sock) {


struct session s;


accept_conn(sock, &s);


if( !read_request(&s) )


return;


pin_cache(&s);


write_response(&s);


unpin(&s);

}


pin_cache(struct session *s) {


pin(&s);


if( !in_cache(&s) )


read_file(&s);

}

CacheHandler(struct session *s) {


pin(s);


if( !in_cache(s) ) ReadFileHandler.enqueue(s);


else ResponseHandler.enqueue(s);

}

RequestHandler(struct session *s) {


…; if( error )

return;

CacheHandler.enqueue(s);

}

. . .

ExitHandlerr(struct session *s) {


…;
unpin(&s); free_session(s);


}

AcceptHandler(event e) {


struct session *s = new_session(e);


RequestHandler.enqueue(s); }

[von Behren]

State Management

Threads

Events

thread_main(int sock) {


struct session s;


accept_conn(sock, &s);


if( !read_request(&s) )


return;


pin_cache(&s);


write_response(&s);


unpin(&s);

}


pin_cache(struct session *s) {


pin(&s);


if( !in_cache(&s) )


read_file(&s);

}

CacheHandler(struct session *s) {


pin(s);


if( !in_cache(s) ) ReadFileHandler.enqueue(s);


else ResponseHandler.enqueue(s);

}

RequestHandler(struct session *s) {


…; if( error )

return; CacheHandler.enqueue(s);

}

. . .

ExitHandlerr(struct session *s) {


…; unpin(&s);

free_session(s);


}

AcceptHandler(event e) {


struct session *s = new_session(e);


RequestHandler.enqueue(s); }

Accept

Conn.

Write

Response

Read

File

Read

Request

Pin

Cache

Web Server

Exit


Events require manual state management


Hard to know when to free


Use GC or risk bugs

[von Behren]

Accept

Conn

Read

Request

Find

File

Send

Header

Read File

Send Data

Accept

Conn

Read

Request

Find

File

Send

Header

Read File

Send Data

Thread 1

Thread N



Internet Growth and Scale

The Internet

How to handle all those
client requests raining on
your server?

Servers Under Stress

Ideal

Peak: some

resource at max

Overload: some

resource thrashing

Load (concurrent requests, or arrival rate)

Performance

[Von Behren]

Response Time

Components


Wire time +


Queuing time +


Service demand +


Wire time (response)


Depends on


Cost/length of request


Load conditions at server

latency

offered load

Queuing Theory for Busy People


Big Assumptions


Queue is First
-
Come
-
First
-
Served (FIFO, FCFS).


Request arrivals are
independent

(poisson arrivals).


Requests have independent service demands.


i.e., arrival interval and service demand are
exponentially distributed

(noted as “M”).

M/M/1 Service Center

offered load

request stream @


arrival rate

λ

wait here

Process for mean
service demand
D

Utilization


What is the probability that the center is busy?


Answer: some number between 0 and 1.


What percentage of the time is the center busy?


Answer: some number between 0 and 100


These are interchangeable: called utilization

U



If the center is not saturated, i.e., it completes all its
requests in some bounded time, then:



U =
λD

= (arrivals/T * service demand)


“Utilization Law”


The probability that the service center is
idle

is 1
-
U.


Little’s Law


For an unsaturated queue in steady state, mean
response time
R

and mean queue length
N

are
governed by:



Little’s Law:
N = λR



Suppose a task
T

is in the system for
R

time units.


During that time:


λ
R

new tasks arrive.


N

tasks depart (all tasks ahead of
T
).


But in steady state, the flow in balances flow out.



Note
: this means that throughput
X

= λ
.

Inverse Idle Time “Law”

R


1(100%)

Service center
saturates

as 1/ λ
approaches
D
: small increases in
λ cause large increases in the
expected response time
R
.

U

Little’s Law gives response time
R = D/(1
-

U).


Intuitively, each task
T
’s response time
R

=
D

+
DN
.

Substituting λ
R

for
N
:
R

=
D

+
D

λ
R

Substituting
U

for λD:
R

=
D

+
UR

R

-

UR

=
D

--
>
R
(1
-

U
) =
D

--
>
R

=
D
/(1
-

U
)


Why Little’s Law Is Important

1. Intuitive understanding of FCFS queue behavior.


Compute response time from demand parameters (
λ
,
D
).


Compute
N
: how much storage is needed for the queue.

2. Notion of a
saturated

service center.


Response times rise rapidly with load and are unbounded.


At 50% utilization, a 10% increase in load increases
R

by 10%.


At 90% utilization, a 10% increase in load increases
R

by 10x.

3. Basis for predicting performance of
queuing networks.


Cheap and easy “back of napkin” estimates of system
performance based on observed behavior and proposed
changes, e.g.,
capacity planning
, “what if” questions.

What does this tell us about
server behavior at saturation?

Under the Hood

CPU

I/O device

I/O request

I/O completion

start (arrival rate
λ
)



exit

(throughput
λ
until some

center saturates)


Common Bottlenecks


No more File Descriptors


Sockets stuck in TIME_WAIT


High Memory Use (swapping)


CPU Overload


Interrupt (IRQ) Overload

[Aaron Bannert]

Scaling Server Sites: Clustering


server array

Clients

L4: TCP

L7: HTTP

SSL

etc.

Goals

server load balancing

failure detection

access control filtering

priorities/QoS

request locality

transparent caching


smart
switch

virtual IP
addresses
(VIPs)

What to switch/filter on?

L3

source IP and/or VIP

L4

(TCP) ports etc.

L7

URLs and/or cookies

L7

SSL session IDs

Scaling Services: Replication

Internet

Distribute service load across
multiple sites.

How to select a server site for each
client or request?

Is it scalable?

Client

Site A

Site B

?

Extra Slides

(Any new information on the following
slides will not be tested.)

Event
-
Based Concurrent
Servers Using I/O Multiplexing


Maintain a pool of connected descriptors.


Repeat the following forever:


Use the Unix
select f
unction to block until:


(a) New connection request arrives on the listening
descriptor.


(b) New data arrives on an existing connected descriptor.


If (a), add the new connection to the pool of connections.


If (b), read any available data from the connection


Close connection on EOF and remove it from the pool.



[CMU 15
-
213]

Problems of Multi
-
Thread Server


High resource usage, context switch overhead, contended
locks


Too many threads


throughput meltdown, response time
explosion


Solution: bound total number of threads

Event
-
Driven Programming


Event
-
driven programming, also called asynchronous i/o


Using Finite State Machines (FSM) to monitor the progress of requests


Yields efficient and scalable concurrency


Many examples: Click router, Flash web server, TP Monitors, etc.



Java:
asynchronous i/o


for an example see:
http://www.cafeaulait.org/books/jnp3/examples/12/

Traditional Processes


Expensive and “heavyweight”


One system call per process


Fork overhead


Coordination



Events


Need async I/O


Need select


Wasn’t originally available


Not standardized


Immature


But efficient


Code is distributed all through the program


Harder to debug and understand


Threads


Separate interface and implementation


Pthreads interface


Implementation is user
-
level or kernel (native)


If user
-
level, needs async I/O


But hide the abstraction behind the thread interface


Reference

The State of the Art in Locally Distributed Web
-
server Systems


Valeria Cardellini, Emiliano Casalicchio, Michele Colajanni and Philip S.
Yu