Memory Management Overview

streambabySoftware and s/w Development

Dec 14, 2013 (3 years and 6 months ago)

269 views

1
1
Memory Management
2
Overview


Basic memory “management”


Address Spaces


Virtual memory


Page replacement algorithms


Design issues for paging systems


Implementation issues


Segmentation

2
3
Memory Management




Ideally programmers want memory that is



large


fast


non volatile


Memory hierarchy



small amount of fast, expensive memory – cache


some medium-speed, medium price main memory


gigabytes of slow, cheap disk storage


Memory manager


handles the memory hierarchy


Protects processes from each other.

Approaches


Single Process, Contiguous Memory


Multiple Processes, Contiguous Memory


Multiple Processes, “Discontiguous” Memory


Multiple Processes, Only partially in memory
4
3
5
Basic Memory Management
Single Process without Swapping or Paging
Three simple ways of organizing memory

-
an operating system with one user process

6
Binding


If a program has a line:

int x;

When is the address of
x
determined?


What are the choices?


At
compile
time


At
load
time


At
run
time
4
7
Base / Limit Registers


Binding done at run
time.


Addresses are added to
base value to map to
physical address


Addresses larger than
limit value are an error
8
Swapping
Memory allocation changes as



processes come into memory


leave memory

Shaded regions are unused memory

5
9
Managing Free Memory


Assume a process being loaded can ask for
any size “chunk” of memory needed.


We need to be able to find a chunk the right
size.


How can we keep track of free chunks
efficiently?
10
Memory Management with Bit Maps



Part of memory with 5 processes, 3 holes



tick marks show allocation units


shaded regions are free


Corresponding bit map



Same information as a list

6
11
Memory Management with Linked Lists

Four neighbor combinations for the terminating process X
12
Order of Search for Free Memory


We can search for a large enough free block of free
memory starting from the beginning. That’s called
first fit
.


If we already skipped over the first N holes because they
were too small, maybe it’s a waste of time to look there
again. Try
next fit.



Should we be more selective in our choice? After all
we’re just grabbing the first thing that works…


How does first (or next) fit effect fragmentation? Could
we do better?


Can we find a hole that fits faster? What are the
downsides?
7
13
The Contiguous Constraint


So far a process’s memory has been
contiguous.


What if it didn’t have to be?


What problems would that help solve?


How would the hardware need to change?


What additional work would the OS have to
do?
14
It’s All Gotta Be in Memory
(or does it?)


We have assumed that the entire process has
to be in memory whenever it is running.


What if it didn’t have to be?


What problems would that help solve?


How would the hardware need to change?


What additional work would the OS have to
do?
8
15
Virtual Memory

The position and function of the MMU
16
Paging
The relation between
virtual addresses
and physical
memory addres-
ses given by
page table
9
17
Page Tables

Internal operation of MMU with 16 4 KB pages
18
Page Tables
2-level



32 bit address with 2 page table fields


Two-level page tables
Second-level page tables

Top-level
page table

10
19
Page Table Entry


Present/absent is also called Valid


Modified is also called Dirty


Referenced is also called Accessed


Why would caching be disable?
20
Pentium PTE
11
21
TLBs – Translation Lookaside Buffers

A TLB to speed up paging
22
Inverted Page Tables
Comparison of a traditional page table with an inverted page table
12
23
Page Replacement Algorithms



Page fault forces choice



If there are no free page frames,
we have to make room for incoming page


Which page should be removed?


Modified page frame must first be saved before
being evicted



An unmodified page frame can just overwritten


Better not to choose an often used page



Likely to be brought back in again soon
24
Optimal Page Replacement Algorithm



Replace page needed at the farthest point in future



Optimal but unrealizable


Estimate by …


logging page use on previous runs of process


although this is impractical

13
25
Not Recently Used Page Replacement Algorithm


Each page has Reference bit, Modified bit


bits are set when page is referenced, modified


Pages are classified
1.

not referenced, not modified
2.

not referenced, modified
3.

referenced, not modified
4.

referenced, modified


NRU removes page at random


from lowest numbered non empty class
26
FIFO Page Replacement Algorithm



Maintain a linked list of all pages



in order they came into memory



Page at beginning of list replaced



Disadvantage



page in memory the longest may be often used
14
27
Second Chance Page Replacement Algorithm


Operation of a second chance



pages sorted in FIFO order


Page list if fault occurs at time 20,
A
has
R
bit set
(numbers above pages are loading times)
28
The Clock Page Replacement Algorithm
15
29
Least Recently Used (LRU)


Principle: assume a page used recently will be used again
soon


throw out page that has been unused for longest time


Implementation


Keep a linked list of pages


most recently used at front, least at rear


update this list
every memory reference
!!


Or keep time stamp with each PTE


choose page with oldest time stamp


Again, this must be updated with every memory reference.
30
LRU
(Another Hardware Solution)
LRU using a matrix – pages referenced in order

0,1,2,3,2,1,0,3,2,3
16
31
LRU Approximation?


How could we
approximate
LRU?


We can’t track every time a page is referenced, but we can
sample
the data.


How often? Once per clock tick, perhaps.


Update a counter for each page that has been referenced in
the last clock tick.


Take the page with the lowest count
i.e. “Not
Frequently
Used” (NFU)


How well does this work?
32
Simulating
LRU in Software
Aging



The aging algorithm simulates LRU in software


Note 6 pages for 5 clock ticks, (a) – (e)
17
33
Thrashing


A program causing page faults every few instructions is
said to be
thrashing
.


What causes thrashing?


If a process keeps accessing random new pages, then it is
hard to anticipate what it will use next.


Most programs exhibit
temporal
and
spatial
locality.


Temporal locality: if the process accessed a particular address, it is
likely to do so again soon.


Spatial locality: if the process accessed a particular address, it is
likely to access nearby addresses soon.


Processes that follow this principle tend not to thrash
unless they have to fight for memory.
34
Causing Thrashing
within a single process


The first nested loop demonstrates spatial locality


The second thrashes.
const int
ROWS = 10000;
const int
COLS = 1024;
int
arr[ROWS][COLS];
int
main() {

for
(
int
row = 0; row < ROWS; ++row)

for
(
int
col = 0; col < COLS; ++col)
arr[row][col] = row * col;

for
(
int
col = 0; col < COLS; ++col)

for
(
int
row = 0; row < ROWS; ++row)
arr[row][col] = row * col;
}
18
35
The Working Set Page Replacement Algorithm (1)


The working set is the set of pages used by the
k

most recent memory references


w(k,t) is the size of the working set at time,
t

36
The Working Set Page Replacement Algorithm (2)
The working set algorithm
19
37
The WSClock Page Replacement Algorithm
Operation of the WSClock algorithm
38
Review of Page Replacement Algorithms
20
39
Modeling Page Replacement Algorithms
Belady's Anomaly


FIFO with 3 page frames


FIFO with 4 page frames


P
's show which page references show page faults

40
“Stack” Algorithms
State of memory array,
M
, after each item in
reference string is processed
21
41
The Distance String
Probability density functions for two
hypothetical distance strings
42
The Distance String


Computation of page fault rate from distance string


the
C
vector


the
F
vector
22
43
Design Issues for Paging Systems
Local versus Global Allocation Policies



Original configuration


Local page replacement


Global page replacement

44
Page Fault Rate


Page fault rate as a function of the number of page frames assigned


Use to determine if a process should be granted additional pages.
23
45
Load Control


Despite good designs, system may still
thrash


When PFF algorithm indicates


some processes need more memory


but
no
processes need less


Solution :
Reduce number of processes competing for memory


swap one or more to disk, divide up frames they held


reconsider degree of multiprogramming
46
Page Size (1)
Small page size


Advantages


less internal fragmentation


better fit for various data structures, code sections


less unused program in memory


Disadvantages


programs need many pages, larger page tables
24
47
Page Size (2)


Overhead due to page table and internal
fragmentation


Where


s = average process size in bytes


p = page size in bytes


e = page entry
page table space
internal
fragmentation
Optimized when
48
Separate Instruction and Data Spaces


One address space


Separate I and D spaces
25
49
Shared Pages
Two processes sharing same program sharing its page table
50
Cleaning Policy


Need for a background process, paging daemon


periodically inspects state of memory


When too few frames are free


selects pages to “evict” using a replacement
algorithm.


Evicted pages are kept in a pool in case they are
wanted again.


Dirty pages are scheduled to be written out.
26
51
Implementation Issues
Operating System vs. Hardware for Paging
1.

Process creation


Create and initialize page table.


Pre-fetch pages.
2.

Context Switch


Point MMU to page table for new process


TLB flushed
3.

Memory Reference


Map virtual address to physical address


Determine if page fault


If fault, determine whose fault and resolve
4.

Page Replacement


Record page access / modifies


Determine page to replace.
5.

Process termination time


release page table, pages
52
Instruction Backup
An instruction causing a page fault
27
Global Flag


Some pages are used by every process in the
system.


Which?


What would we like to have happen special
for those pages during a context switch.


How can the hardware know?
53
54
Locking Pages in Memory


Virtual memory and I/O occasionally interact


Proc issues call for read from device into buffer


while waiting for I/O, another processes starts up


has a page fault


buffer for the first proc may be chosen to be paged out


Need to specify some pages locked


exempted from being target pages
28
55
Backing Store
(a) Paging to static swap area
(b) Backing up pages dynamically

56
Separation of Policy and Mechanism
Page fault handling with an external pager
29
57
Memory Management
NT & Unix


Exact details not as easy as in process scheduling.


Attempt to keep a set of “free pages” using a “page daemon”. (e.g.
Linux:
kswapd
, NT: Working Set Manger)


Essentially “demand paging”.


NT brings in “clusters”, typically 8 pages for code, 4 for data.


Unix’s swapper
used to
bring in
all
referenced pages, when swapping a
process back in. Now uses demand paging.


OS present in every process’s
virtual
address space.


Reduces changes needed in TLB and cache.


No changes needed for system call.


Load control: Swap out entire processes “as needed”. This is currently
something that one hopes to avoid, but the feature is still present.


Some structure to describe where pages are currently, e.g. location in
paging file. Unix: Core Map. NT: Page Frame Database.


A portion (e.g., 64KB) of the top/bottom of address space unmapped.
58
MM in Unix


Early versions totally swapping


Current based on paging.


Page Daemon ¼ sec.


Refill free list when less then some value.


BSD:
lotsfree =
¼ memory.


SysV: has some min/max
.
If less than min, fill until max. Goal is to
reduce frequency of page daemon, thereby reducing
thrashing
.


Originally used Global Clock.


As memory sizes increased BSD went to 2-handed clock.


Front clears referenced bit


Back hand picks victim,


Result: a referenced page is
really
recent.


SysV instead keeps frequency count of pages
not
referenced.


Only boots page out if count greater than some value


Seems they had the opposite problem of the BSD folk.
30
59
MM in Unix


Load Control.


If page daemon has to work too often then swap
someone out.


Every few seconds look for someone to bring back.


Decision based on swapped out time, size, niceness, if it was
sleeping…


Only bring back the “user structure” portion of PCB and the
page tables. (I’m surprised page tables are removed. Must
record swapped out page for process somewhere…)


“Core Map” describes frames, free list, location to
swap to.

60
MM in NT


Demand paging of “Clusters”.


(code: 8, data: 4). XP does some prepaging after application’s first run.


Maintains a “working set” per process, not thread (remember scheduling is per
thread)


WS is based on size, not time window.


Keeps a min/max per process


If we have > max and a page fault then steal process’s page. Otherwise add.


~ 1sec. Balance Set Manager, checks for sufficient free pages. Calls Working
Set Manager to free pages.


Sort processes by “desirability”. Large idle, first. Foreground last.


If < min and “enough page faults” since last check, then leave alone.


Determine #pages to remove from process. Depends on need, size of WS relative
to min/max.


Examine
all
pages in a process. Uses an unreferenced count, like AT&T Unix.


Pages with “high” unreferenced removed.


Continue examining additional processes till sufficient pages free. More aggressive
passes as needed.
31
61
MM in NT


Load Control.


Swapper runs every 4 sec.


If thread asleep for 7 sec. Mark thread’s kernel stack “in transition”.


When all threads in process marked, then swap out.
62
MM in NT
32
63
MM in FreeBSD


1GB for kernel, unless you want lots of small processes using lots of
kernel services, then configure for 2GB


First few virtual pages reserved to catch bad pointers.


Shared libraries placed by default just below default stack limit.


Memory Lists:


Wired


Active


Inactive. Dirty (min: 0%, max 4.5%)


Pageout daemon moves pages from Inactive to “cache”


attempts to balance i/o load by not starting too many concurrent writes.


Runs as a
kernel
process. This way it has access to kernel data structures, etc., but
can be scheduled to run as a process so it can sleep.


Cache. Clean. (min: 3%, max 6%)


Free. (min: 0.7%, max 3%)


One end is zeroed, the other is not.


Idle process tries to keep 75% of frames on Free list zeroed.
64
MM in FreeBSD


Page coloring. Attempt to avoid cache conflicts.


Cache holds
pages.
A 1MB cache with 4KB pages has 256 cache pages.
Each physical page on a 1MB boundary maps to the same cache page.
Thus each cache page represents as many physical pages as there are
MB’s
of physical memory.


When allocating frames, color coding attempts to avoid such potential
conflicts by making contiguous
virtual
pages, get frames that map to
contiguous
cache
pages.


Pure demand paging when swapping back in.


Back in BSD4.3, count of resident pages was used.
DIoF
says might
return.


Page Replacement


Least
actively
used algorithm.


Page has count of three when first brought in


Incremented each time reference bit found set to a max of 64


Decremented each time referenced bit found not set.


At zero page moved from Active to Inactive list.
33
65
Segmentation (1)


One-dimensional address space with growing tables


One table may bump into another
66
Segmentation (2)
Allows each table to grow or shrink, independently

34
67
Segmentation (3)
Comparison of paging and segmentation

68
Implementation of Pure Segmentation

35
69
Segmentation with Paging: MULTICS (1)


Descriptor segment points to page tables


Segment descriptor – numbers are field lengths

70
Segmentation with Paging: MULTICS (2)
A 34-bit MULTICS virtual address
36
71
Segmentation with Paging: MULTICS (3)
Conversion of a 2-part MULTICS address into a main memory address
72
Segmentation with Paging: MULTICS (4)


Simplified version of the
MULTICS
TLB


Existence of 2 page sizes makes actual TLB more complicated
37
73
Segmentation with Paging: Pentium (1)

A Pentium selector
74
Segmentation with Paging: Pentium (2)



Pentium code segment descriptor


Data segments differ slightly

38
75
Segmentation with Paging: Pentium (3)

Conversion of a (selector, offset) pair to a linear address
76
Segmentation with Paging: Pentium (4)

Mapping of a linear address onto a physical address
39
77
Segmentation with Paging: Pentium (5)

Protection on the Pentium
Level