CS 620 Advanced Operating

hedgebornabaloneΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 5 χρόνια και 1 μήνα)

102 εμφανίσεις

CS 620 Advanced Operating

Lecture 5


Professor Timothy Arndt

BU 331



Keep programs interactive.

Programs may block waiting for I/O.

This is a bad thing if the program is interactive.

Background save.

Consider saving a large file to a slow disk system.

The user has to wait for the save operation to finish before

Multiple clients.

Consider also a server program with multiple clients.

We must fork off a new process for each client.


Basic idea

Multiple lightweight processes.

Traditional processes are created very slowly and use many
system resources.

Threads are similar to processes.

They have their own stack and PC.

Single address space.

By sharing a single address space, there is no need to have
separate (costly) address spaces.

Since threads of a single program cooperate (unlike processes
of separate programs) it is possible for them to share an
address space.

Thread Usage in Nondistributed

Context switching as the result of IPC


Threads also share:

open files

child processes



Synchronization primitives available.

Necessary since threads share a common memory.

Semaphores, mutexes, etc.


So what’s a thread?


Program counter

what about I/O, signals, global variables (like


Is it managed from user or kernel level?

User threads have a very cheap task context

Kernel threads handle blocking I/O cleanly

In order for user threads not block, we need
extended models for I/O

E.g. select() indicates which files are ready to
transfer data so we don’t block

Hybrid is also possible

Thread Implementation

Combining kernel
level lightweight processes and user
level threads.


Preemption can be implemented via signal

Should user
level threads be preempted?

Easier programming model if processes yield() the

But it is a nuisance to program with extra yield() calls

Preemption can be controlled with special no preempt


So how do you use threads?

User interface/computation/I/O handled
separately (think of a browser)

up server threads

On multiprocessor systems, we can have
threads working in parallel on the multiple
processors as an alternative to shared memory

Multithreaded Servers (1)

A multithreaded server organized in a dispatcher/worker

Threads in Windows

Each process in Windows contains one or more threads.

Threads are the executable units in Windows, not processes.

Threads have the following attributes:

The PID of the process that owns it.

A numeric base priority specifying its importance relative to
other threads.

A dynamic priority.

Its execution time so far.

An allowed processor set.

An exit status.

Threads in Windows

The Windows Kernel schedules threads and
handles interrupts and exceptions.

The Kernel schedules or

threads for the
processor in order of priority.

It also

threads of lower priority in favor
of threads of higher priority.

It can force
context switches
, directing the
processor to drop one task ands pick up another.

Therefore code operating in this system must be
. (Able to be interrupted and resumed
unharmed and shared by different threads executing
the code on different processors.)

Threads in Windows

The Kernel’s own code does not, technically, run in

Hence it is the only part of the OS that is not preemptible or

The rest of the threads in Windows are preemptible and fully

Code which is non
reentrant can cause serialization, damaging
the performance of the OS on SMP machines.

The Kernel schedules ready threads for processor time
based upon their dynamic priority, a number from 1 to

Threads in Windows

The highest priority thread always runs on the processor, even
if this requires that a lower
priority thread be interrupted.

base priority class

of a process establishes a range
for the base priority of the process and its thread. The
base priority classes are:





base priority

of a process varies within the range
established by its base priority class.

Threads in Windows

When a user interacts with a process (the process window is at
the top of the window stack), Windows boosts the priority of
the process to maximize its response.

base priority

of a thread is a function of the base priority
of the process in which it runs. It varies within +/

2 from the
base priority of the process.

dynamic priority

of a thread is a function of its base
priority. Windows continually adjusts the dynamic priority of
threads within the range established by its base priority.

The base priority class of a running process can be changed by
using Task Manager.

Threads in Windows

The Windows Kernel takes maximum advantage of
multiprocessor configurations by implementing
symmetric multiprocessing

(SMP) and
soft affinity

SMP allows the threads of any process, including the OS, to
run on any processor.

The threads of a single process can run on different processors
at the same time.

With soft affinity, the Kernel attempts to run the thread on the
last processor it ran on.

Applications can restrict threads to run only on certain
processors (
hard affinity

Threads in Windows

The Kernel manages two types of objects:

Dispatcher objects have a signal state (signaled or
nonsignaled) and control dispatching and
synchronization of system operations.

Semaphores, mutexes, events, etc.

Control objects are used to control the operation of
the Kernel but do not affect dispatching.

Processes, interrupts, etc.

The I/O Manager (a component of the
Windows Executive) supports Asynchronous

Asynchronous I/O allows an application to continue
working while an I/O operation completes.

Threads in Windows

A thread may wait for the I/O to complete or we
may use an application procedure call (APC) that
the I/O manager calls when the I/O completes or we
may use a synchronization object (e.g. an event) that
the I/O system sets to the signaled state when I/O

The Process Manager creates and deletes
processes and tracks process objects and thread

The subsystems define the rules for threads and

System Model

We look at three models:

Workstations (zero cost solution)

Clusters (a.k.a. NOW a.k.a. COW, a.k.a.
LAMP, a.k.a. Beowulf, a.k.a. pool)


Workstation model

Connect workstations in department via LAN

Includes personal workstations and public ones

We often have dedicated file servers

Workstation Model

Workstation Model



The workstations can be diskless

Not so popular anymore (disks are cheap)

Maintenance is easy

Must have some startup code in ROM

If you have a disk on the workstation you can use it for

1. Paging and temporary files

2. 1. + (some) system executables

3. 2. + file caching

4. full file system


Case 1 is often called dataless

Just as easy to maintain (software) as diskless

We still need startup code in ROM

Case 2

Reduces load more and speeds up program start time

Adds maintenance since new releases of programs must be
loaded onto the workstations

Case 3

We can have a very few executables permanently on the disk

Must keep the caches consistent

Not trivial for data files with multiple writers

This issue comes up for NFS as well

Should you cache whole files or blocks?


Case 4

You can work if just your machine is up

But you lose location transparency

Also requires the most maintenance

Using idle workstations

Early systems did this manually via rsh

Still used today.

How do you find idle workstations?


Idle = no mouse or keyboard activity and low load

Workstation can announce it is idle and this is
recorded by all

A job looking for a machine can inquire

Must worry about race conditions

Some jobs want a bunch of machines so they look
for many idle machines

Can also have centralized solution, processor server

Usual tradeoffs apply here

What about the local environment?


Files on servers are no problem

Requests for local files must be sent home

... but not needed for temporary files

System calls for memory or process management probably
need to be executed on the remote machine

Time is a bit of a mess unless have we time synchronized by a
system like ntp

If a program is interactive, we must deal with devices

mouse, keyboard, display

What if the borrowed machine becomes non
idle (i.e.
the owner returns)?


Detect presence of user.

Kill off the guest processes.

Helpful if we made checkpoints (or ran short jobs)

Erase files, etc.

We could try to migrate the guest processes to other hosts but
this must be very fast or the owner will object.

Our goal is to make owner not be aware of our presence.

May not be possible since you may have paged out his basic
environment (shell, editor, X server, window manager) that s/he
left running when s/he stopped using the machine.


Bunch of workstations without displays in
machine room connected by a network.

They are quite popular now.

Indeed some clusters are packaged by their
manufacturer into a serious compute engine.

Ohio Supercomputing Center replaced MPP and
Vector supercomputers with clusters

Used to solve large problems using many
processors at one time

Pluses of large time sharing system vs. small
individual machines.

A Cluster System

A Cluster System


Also the minuses of timesharing.

We can use easy queuing theory to show that a large
fast server better in some cases than many slower
personal machines.


Each user has a workstation and uses the pool
for big jobs.

It is the dominant model for cluster based


Virtualization has a long history.

It was important in the 1960s/70s

Faded during the 1980s/90s

Increasing importance nowadays


Ease of management

Different types of virtualization

Process virtual machine

Virtual machine monitor (hardware virtual machine)


Process virtual machine


Macromedia Flash Player






Microsoft Virtual PC

The Role of Virtualization in

Distributed Systems

(a) General organization between a program,
interface, and system. (b) General organization of
virtualizing system A on top of system B.

Architectures of Virtual

Interfaces at different levels

An interface between the hardware and software
consisting of machine instructions

that can be invoked by any program.

An interface between the hardware and software,
consisting of machine instructions

that can be invoked only by privileged programs, such
as an operating system.

Architectures of Virtual

Interfaces at different levels

An interface consisting of system calls as offered by
an operating system.

An interface consisting of library calls

generally forming what is known as an application
programming interface (API).

In many cases, the aforementioned system calls are
hidden by an API.

Architectures of Virtual

Figure 3
6. Various interfaces offered by
computer systems.

Architectures of Virtual

A process virtual machine, with multiple

instances of (application, runtime)

Architectures of Virtual

A virtual machine monitor, with multiple
instances of (applications, operating system)

Processor Allocation

Processor Allocation

Decide which processes should run on which

Could also be called process allocation.

We assume that any process can run on any

Processor Allocation

Often the only difference between different
processors is:

CPU speed

CPU speed and amount of memory

What if the processors are not homogeneous?

Assume that we have binaries for all the different

What if not all machines are directly connected

Send process via intermediate machines

Processor Allocation

If we have only PowerPC binaries, restrict the
process to PowerPC machines.

If we need machines very close for fast
communication, restrict the processes to a group of
close machines.

Can you move a running process or are
processor allocations done at process creation

Migratory allocation algorithms vs. non migratory.

Processor Allocation

What is the figure of merit, i.e. what do we
want to optimize in order to find the best
allocation of processes to processors?

Similar to CPU scheduling in centralized operating

Minimize response time

is one possibility.

Processor Allocation

We are

assuming all machines are equally fast.

Consider two processes. P1 executes 100 millions
instructions, P2 executes 10 million instructions.

Both processes enter system at time t=0

Consider two machines A executes 100 MIPS, B 10 MIPS

If we run P1 on A and P2 on B each takes 1 second so
average response time is 1 sec.

If we run P1 on B and P2 on A, P1 takes 10 seconds P2 .1
sec. so average response time is 5.05 sec.

If we run P2 then P1 both on A finish at times .1 and 1.1
so average response time is .6 seconds!!

Processor Allocation

Minimize response ratio

Response ratio is the time to run on some machine
divided by time to run on a standardized
(benchmark) machine, assuming the benchmark
machine is unloaded.

This takes into account the fact that long jobs should
take longer.

Maximize CPU utilization


Jobs per hour

Weighted jobs per hour

Processor Allocation

If weighting is CPU time, we get CPU utilization

This is the way to justify CPU utilization (user centric)

Design issues



Use deterministic for embedded applications, when
all requirements are known a priori.

Patient monitoring in hospital

Nuclear reactor monitoring



We have a tradeoff of accuracy vs. fault tolerance
and bottlenecks.

Processor Allocation


best effort

Optimal normally requires off line processing.

Similar requirements as for deterministic.

Usual tradeoff of system effort vs. result quality.

Transfer policy

Does a process decide to shed jobs just based on its
own load or does it have (and use) knowledge of
other loads?

Also called local vs. global

Usual tradeoff of system effort (gather data) vs.
result quality.

Processor Allocation

Location policy


receiver initiated

Sender initiated

uploading programs to a compute

Receiver initiated

downloading Java applets

Look for help vs. look for work.

Both are done.

Processor Allocation

Implementation issues

Determining local load

Normally use a weighted mean of recent loads with
more recent weighted higher.

Processor Allocation

Example algorithms

Min cut deterministic algorithm

Define a graph with processes as nodes and IPC
traffic as arcs

Cut the graph (i.e some arcs) into pieces
so that

All nodes in one piece can be run on one processor

Memory constraints

Processor completion times

Values on cut arcs are minimized

Processor Allocation

Minimize the max

minimize the maximum traffic for a process pair

Minimize the sum

minimize total traffic

Minimize the sum to/from a piece

don't overload a processor

Minimize the sum between pieces

minimize traffic for processor pair

Tends to get hard as you get more realistic

Processor Allocation

down centralized algorithm

Centralized table that keeps "usage" data for a
, the users are defined to be the workstation
owners. Call this the score for the user.

goal is to give each user a fair share

When user requests a remote job, if a
workstation is available it is assigned.

For each process a user has running remotely,
the user's score increases by a fixed amount
each time interval.

Processor Allocation

When a user has an unsatisfied request pending (and
none being satisfied), the score decreases (it can go

If no requests are pending and none are being satisfied,
the score is bumped towards zero.

When a processor becomes free, assign it to a
requesting user with the lowest score.

Processor Allocation

Hierarchical algorithm


assign multiple processors to a job

Quick idea of algorithm

Processors arranged in a tree

Requests go up the tree until a subtree has enough resources

Request is split and parts go back down the tree

Arrange processors in a hierarchy (tree)

This is a logical tree independent of how physically connected

Each node keeps (imperfect) track of how many available
processors are below it.

If a processor can run more than one process, must be more
sophisticated and must keep track of how many processes can be
allocated (without overload) in the subtree below.

Processor Allocation

If a new request appears in the tree, the current node sees if it
can be satisfied by the processors below (plus itself).

If so, do it.

If not pass the request up the tree

Actually since machines may be down or the data on availability
may be out of date, you actually try to find more processes than

Once a request has gone high enough to be satisfied, the
current node splits the request into pieces and sends each piece
to appropriate child.

What if a node dies?

Promote one of its children say C

Now C's children are peers with the previous peers of C

Processor Allocation

If this is considered too unbalanced, we can promote one of C
children to take C's place.

How can we decide which child C to promote?

Peers of dead node have an election

Children of dead node have an election

Parent of dead node decides

What if the root dies?

Must use children since no peers or parent

If we want to use peers, then we do not have a single root

I.e. the top level of the hierarchy is a collection of roots that
communicate. This is a forest, not a tree

What if multiple requests are generated simultaneously?

Processor Allocation

Gets hard fast as information gets stale and potential race
conditions and deadlocks are possible.

Distributed heuristic algorithm


find a lightly loaded processor to migrate job to

Send probe to a random processor

If the remote load is low, ship the job

If the remote load is high, try another random probe

After k (parameter of implementation) probes all say
the load is too high, give up and run the job locally.

Modelled analytically and seen to work fairly well


General goal is to have processes that
communicate frequently run simultaneously

If they don’t and we use busy waiting for
messages, we will have a huge disaster.

Even if we use context switching, we may have
a small disaster as only one message transfer
can occur per time scheduling slot

scheduling (a.k.a. gang scheduling).
Processes belonging to a job are scheduled


Time slots are coordinated among the processors.

Some slots are for gangs; other slots are for regular