Slide Set 22 - Memory Management - R. DeBry

clippersdogheartedSoftware and s/w Development

Dec 14, 2013 (3 years and 9 months ago)

82 views

Memory
Management

Managing Memory … The
Simplest

Case

The O/S

User

Program

0

0xFFF …

* Early PCs and Mainframes

* Embedded Systems



One user program at a time.



The logical address space is


the same as the physical


address space.

But in Modern Computer Systems

Modern memory managers subdivide memory to

accommodate multiple processes.


Memory needs to be allocated efficiently to pack

as many processes into memory as possible


When required a process should be able to have

exclusive use of a block of memory, or permit

sharing of the memory by multiple processes.


Primary memory is abstracted so that a program

perceives that the memory allocated to it is a large

array of contiguously addressed bytes (but it usually isn’t).


* Relocation

* Protection

* Sharing

* Physical Organization

Four Major Concerns of

a Memory Manager

The Programmer does not know where the program will
be placed in memory when it is executed. While the
program is executing, it may be swapped to disk and
returned to main memory at a different location
(relocated). Memory references in the code must be
translated to actual physical memory address


Relocation

Process’s logical

address space

Physical address

space

With multiple processes in memory and running
simultaneously” the system must protect one
process from referencing memory locations in
another process. This is more of a hardware
responsibility than it is an operating system
responsibility.

Protection

Source

Compiler

Relocatable

Object Module

Relocatable

Object Modules

Linker

Load Module

All have their own:


Program Segment


Data Segment


Stack Segment

Library Modules

All references to data or

functions are resolved…

Linking and Loading a Program

Loader

Load Module

Process Image

In Main Memory

Load Module

Process Image

In Main Memory

Absolute Loader

program

data

program

data

0

0

no changes in addresses

Load Module

Process Image

In Main Memory

Static Binding

program

data

program

data

0

All addresses in the

code segment are

relative to 0

Add the offset to every

address
as the code is

loaded.

Offset = 1000

Jump 400

Jump 1400

1000

Load Module

Process Image

In Main Memory

Dynamic Run
-
time Binding

program

data

program

data

0

All addresses in the

code segment are

relative to 0

Offset = 1000

Jump 400

Jump 400

Addresses are

maintained in

relative

format

Address translation

takes place on the

fly at run time.

program

data

Base Register

Limit Register

jump, 400

Adder

Comparator

absolute address

interrupt

1000

relative address

400

1400

segment error!

A Code Example


. . .



static int gVar;



. . .


int function_a(int arg)


{


. . .



gVar = 7;


libFunction(gVar);


. . .

}

static variables are stored in the

data segment.

generated code will be stored

in the code segment.

libFunction( ) is defined in an external

module. At compile time we don’t know

the address of its entry point

A Code Example


. . .



static int gVar;



. . .


int function_a(int arg)


{


. . .



gVar = 7;


libFunction(gVar);


. . .

}


0000

. . .


. . .


0008

entry function_a


. . .


0220

load r1, 7


0224

store r1, 0036


0228

push 0036


0228

call libFunction


. . .



0400

External Reference Table


. . .


0404

“libFunction”

????


. . .


0500

External Definition Table


. . .


0540

“function_a”

0008


. . .


0600

Symbol Table


. . .


0799

End of Code Segment



. . .


0036

[space for gVar]


. . .


0049

End of Data segment

Code Segment

Data Seg

relative addresses


0000

. . .


. . .


0008

entry function_a


. . .


0220

load r1, 7


0224

store r1, 0036


0228

push 0036


0228

call libFunction


. . .



0400

External Reference Table


. . .


0404

“libFunction”

????


. . .


0500

External Definition Table


. . .


0540

“function_a”

0008


. . .


0600

Symbol Table


. . .


0799

(end of code segment)



. . .


0036

[space for gVar]


. . .


0049

(end of data segment)

libFunction


0000 (other modules)



. . .


1008

entry function_a


. . .


1220

load r1, 7


1224

store r1, 0136


1228

push 1036


1232

call 2334


. . .


1399

(end of function_a)


. . .

(other modules)


2334

entry libFunction


. . .


2999

(end of code segment)



. . .

0136

[space for gVar]

. . .


1000

(end of data segment)


relative addresses

Code Segment

Data seg

Object File

Contains an external definition table

Indicating the relative entry point


0000 (other modules)



. . .


1008

entry function_a


. . .


1220

load r1, 7


1224

store r1, 0136


1228

push 0136


1232

call 2334


. . .


1399

(end of function_a)


. . .

(other modules)


2334

entry libFunction


. . .


2999

(end of code segment)



. . .


0136

[space for gVar]


. . .


1000

(end of data segment)



4000 (other modules)



. . .


5008

entry function_a


. . .


5220

load r1, 7


5224

store r1, 7136


5228

push 7136


5232

call 6334


. . .


5399

(end of function_a)


. . .

(other modules)


6334

entry libFunction


. . .


6999

(end of code segment)



. . .


7136

[space for gVar]


. . .


8000

(end of data segment)


real addresses

static

Bind

(offset 4000)

Load Module

Multiple processes (fork) running the same

executable


Shared memory

Sharing Memory

The flow of information between the various

“levels” of memory.

Physical Organization

Computer memory consists of a large array of words

or bytes, each with its own address.


Registers built into the CPU are typically accessible in

one clock cycle. Most CPUs can decode an instruction

and perform one or more simple register operations

in one clock cycle. The same is not true of memory

operations which can take many clock cycles.

Registers

Cache

RAM

Disk

Optical, Tape, etc

fast but expensive

cheap but slow

1 machine cycle

Memory Allocation

Before an address space can be bound to physical

addresses, the memory manager must allocate

the space in real memory where the address space will

be mapped to. There are a number of schemes to

do memory allocation.

Fixed Partitioning

Equal
-
size fixed partitions


any process whose size is less than


or equal to the partition size can be

loaded into an available partition


if all partitions are full, the operating

system can swap a process out of a partition


a program may not fit in a partition.

Then the programmer must design the

program with
overlays


Fixed Partitioning

Main memory use is inefficient.


Any program, no matter how small, occupies

an entire partition. This is called
internal

fragmentation
.


But . . . It’s easy to implement.


Placement Algorithm with

Fixed Size Partitions

Equal
-
size partitions

because all partitions are of equal size, it does

not matter which partition is used


Placement is trivial.


Example is OS/360 MFT. The operator fixed the

partition sizes at system start up.


Two options:


* Separate Input Queues


* Single Input Queue

Fixed Partition with

Different Sizes

Multiple Input Queues

O/S

0

100K

200K

400K

700K

800K

Partition 1

Partition 2

Partition 3

Partition 4

Jobs are put into the queue

for the
smallest

partition big

enough to hold them.

Disadvantage?

Memory can go unused,

even though there are

jobs waiting to run that

would fit.

Single Input Queue

O/S

0

100K

200K

400K

700K

800K

Partition 1

Partition 2

Partition 3

Partition 4

When a partition becomes free

pick the first job on the queue

that fits.

Disadvantage?

Small jobs can be put into

much larger partitions than

they need, wasting memory

space.

Single Input Queue

O/S

0

100K

200K

400K

700K

800K

Partition 1

Partition 2

Partition 3

Partition 4

Alternative Solution


scan the

whole queue and find the job

that best fits.

Disadvantage?

Discriminates against small jobs.

Starvation.


CPU Utilization

From a probabalistic point of view ….


Suppose that a process spends a fraction
p

of its time

waiting for I/O to complete. With
n

processes in memory

at once, the probability that all
n

processes are waiting

for I/O (in which case the CPU is idle) is
p
n
.


CPU utilization is therefore given by the formula



CPU Utilization = 1


p
n

Consider the case where processes spend 80% of their time

waiting for I/O (not unusual in an interactive end
-
user system

where most time is spent waiting for keystrokes). Notice that

it requires at least 10 processes to be in memory to achieve

a 90% CPU utilization.

Predicting Performance

Suppose you have a computer that has 32MB of memory and that

the operating system uses 16MB. If user programs average 4MB

we can then hold 4 jobs in memory at once. With an 80% average

I/O wait



CPU utilization = 1


0.8
4
= approx 60%




Adding 16MB of memory allows us to have 8 jobs in memory at once

So



CPU utilization = 1
-

.8
8

= approx 83%

Adding a second 16MB would only increase CPU utilization to 93%

Dynamic Partitioning

Partitions are of variable length and number.

A process is allocated exactly as much memory as it requires.


Eventually you get holes in the memory.

This is called external fragmentation.


You must use
compaction

to shift processes so they

are contiguous and all free memory is in one block.


O/S

8M

56M

For

Example …

O/S

8M

Process 1

20M

36M

O/S

8M

Process 1

20M

Process 2

14M

18M

Process 3

8M

O/S

8M

Process 1

20M

10M

18M

Process 3

8M

Process 4

4M

O/S

8M

16M

10M

18M

Process 3

8M

Process 4

4M

Process 5

4M

Fragmentation!

Periodically the O/S could do memory compaction


like

disk compaction. Copy all of the blocks of code for loaded

processes into contiguous memory locations, thus opening

larger un
-
used blocks of free memory.


The problem is that this is expensive!

A related question: How much memory do you
allocate to a process when it is created or
swapped in?


In most modern computer languages data can be
created dynamically.

The Heap

This may come as a surprise….


Dynamic memory allocation with
malloc
, or
new
, does not really

cause system memory to be dynamically allocated to the process.


In most implementations, the linker anticipates the use of
dynamic memory and reserves space to honor such requests. The
linker reserves space for both the process’s run
-
time stack and
it’s heap. Thus a
malloc( )

call returns an address within the
existing address space reserved for the process.


Only when this space is used up does a system call to the kernel
take place to get more memory. The address space may have to
be rebound


a very expensive process.

Managing Dynamically
Allocated Memory

When managing memory dynamically, the operating
system must keep track of the free and used blocks
of memory.


Common methods used are bitmaps and linked lists.

Linked List Allocation

P 0 5

H 5 3

P 8 6

P 14 4

H 18 2

P 20 6

P 26 3

H 29 3 X

Hole

Starts at

Length

Process

Memory is divided up into some

number of fixed size
allocation

units
.

Keep List in order sorted by address

Linked List Allocation

P 0 5

H 5 3

P 8 6

P 14 4

H 18 2

P 20 6

P 26 3

H 29 3 X

When this process ends, just merge this node

with the hole next to it (if one exists). We want

contiguous blocks!

Hole

Starts at

Length

Linked List Allocation

P 0 5

H 5 3

P 8 6

P 14 4

H 18 8

P 26 3

H 29 3 X

When blocks are managed this way, there are several algorithms that

the O/S can use to find blocks for a new process, or one being

swapped in from disk.

Hole

Starts at

Length

Dynamic Partitioning
Placement Algorithms

Best
-
fit algorithm

Search the entire list and choose the block that is

the smallest that will hold the request. This algorithm

is the worst performer overall. Since the smallest possible

block is found for a process this algorithm tends to leave

lots of tiny holes that are not useful.


smallest block that process will fit in

tiny hole

Dynamic Partitioning
Placement Algorithms

Worst
-
fit


a variation of best fit

This scheme is like best fit, but when looking for a new

block it picks the
largest

block of unallocated memory.

The idea is that external fragmentation will result in

bigger holes, so it is more likely that another block will fit.


Largest block of unallocated memory

big hole

Dynamic Partitioning
Placement Algorithms

First
-
fit algorithm

Finds the first block in the list that will fit.

May end up with many process loaded in the front end of

memory that must be searched over when trying to

find a free block


Dynamic Partitioning
Placement Algorithms

Next
-
fit


a variation of first fit

This scheme is like first fit, but when looking for a

new block, it begins its search where it left off the

last time. This algorithm actually performs slightly

worse than first fit


Swapping

Used primarily in timeshared systems with single thread
processes.


Optimizes system performance by removing a process from
memory when its thread is blocked.


When a process is moved to the ready state, the process
manager notifies the memory manager so that the address
space can be swapped in again when space is available.


Requires relocation hardware


Swapping can also be used when the memory requirements of
the processes running on the system exceed available memory.

System Costs to do Swapping

If a process requires
S

units of primary storage, and a disk block
holds
D

units of primary storage, then



ceiling(
S
/
D
)


disk writes are required to swap the address space to disk. The
same number of reads are required to swap the address space
back into primary storage.

For example, if a process is using 1000 bytes of memory, and

disk blocks are 256 bytes, then 4 disk writes are required.

Suppose that a process requiring
S

units of primary storage is
blocked for
T

units of time. The resource wasted because the
process stays in memory is



S

x
T
.


What criteria would you use to determine whether or not to
swap the process out of primary storage?

Suppose that a process requiring
S

units of primary storage is
blocked for
T

units of time. The resource wasted because the
process stays in memory is



S

x
T
.


What criteria would you use to determine whether or not to
swap the process out of primary storage?

How big is
S
? If it is small, then the amount of storage made

available for other processes to use is minimized, and another

process may not fit. Swapping would be wasteful if there is not

a process that would fit in the storage made available.

Suppose that a process requiring
S

units of primary storage is
blocked for
T

units of time. The resource wasted because the
process stays in memory is



S

x
T
.


What criteria would you use to determine whether or not to
swap the process out of primary storage?

If
T

is small, then the process will begin competing for primary
storage too quickly to make the swap effective.


If
T

<
R
, the process will begin requesting memory before it is
even completely swapped out
(
R

is the time required to swap).

Suppose that a process requiring
S

units of primary storage is
blocked for
T

units of time. The resource wasted because the
process stays in memory is



S

x
T
.


What criteria would you use to determine whether or not to
swap the process out of primary storage?

For swapping to be effective,
T

must be considerably larger
than 2
R

for every process that the memory manager chooses
to swap out, and S must be large enough for other processes
to execute.

S

is known. Can
T

be predicted?

S

is known. Can
T

be predicted?

When a process is blocked on a slow I/O device, the memory
manager can estimate a lower bound.


What about when a process is blocked by a semaphore
operation?

Example Test Questions

A memory manager for a variable sized region strategy has a

free list of memory blocks of the following sizes:



600, 400, 1000, 2200, 1600, 2500, 1050

Which block will be selected to honor a request for 1603 bytes

Using a best
-
fit policy?

2200

Which block will be selected to honor a request for 949 bytes

Using a best
-
fit policy?

1000

Which block will be selected to honor a request for 1603 bytes

Using a worst
-
fit policy?

2500

If you were designing an operating system, and had to
determine the best way to sort the free block list, how
would you sort it for each of the following policies, and why?

Best
-
fit

Smallest to largest free
block

Worst
-
fit

Largest to smallest free
block