Figure 3.1 Windows NT 's Address Space

wackybabiesSoftware and s/w Development

Dec 14, 2013 (3 years and 8 months ago)


Linux vs. Windows NT Memory Management


1. Introduction

2. Linux Memory Management

2.1 Address Generation in x86

2.2 How Linux Does This

2.3 Page Allocation

2.4 Page Replacement Algorit

2.5 Kernel Memory Allocation

2.5.1 Slab Layout

3. Windows NT Memory Management

3.1 Reserved vs. Committed Memory

3.2 Page Frame Database

3.3 Page Replacement


Linux vs. Windows NT Memory Management

1. Introduction :

Linux and WNT have the common concept in memory management , virtual
memory with paging .The purpose of this paper is to explain the memory management in
Linux and to provide a brief introduction to WNT memory management for compariso
purposes . The research in memory management had a great impact on design of the
hardware . So explanation of memory management is impossible without the discussion
of what support microprocessor or hardware provides to the operating system .

Although m
emory management in Linux is platform independent but most of
these platforms share a common architecture of page tables with varying paging levels.
Linux memory management was designed by taking 64
bit Alpha processor into
consideration. But it easily acc
ommodates other platforms by slight modifications.

So this discussion is little bit hardware dependent , although the basic idea is
same. The hardware which I have chosen is of course Intel Pentium/x86. This
information is valid for any Intel gener
al purpose processor from 80386 to latest Pentium.

2. Linux Memory Management :

As in SVR4 and Solaris ,Linux also uses two separate memory management
schemes ; virtual memory management for user processes and kernel memory
management for the use of kerne
l .Linux divides the memory in two parts . Memory from
0 to 3GB(0xBFFFFFFF) is used for user processes and from 3GB to 4GB(0xFFFFFFFF )
is for kernel . This arrangement is shown in figure 2.1.




Figure 2.1 Linux Address Space



User Space

In user space the demand paged virtual memory scheme is used. Let us consider
the address generation mechanism w.r.t x86 to completely understand this concept.

2.1 Address Generation in x86 :

Intel x86 provides
the support for both segmentation and paging . The maximum
segment size is 4GB which is the complete linear address space of the processor . Smaller
size segments are created by specifying limit field in the descriptor of the segment.

Logical Address

Selector Offset

Linear Address


Global Desc.


Linear Address

Physical Addr.



Page Dir. Table

Figure 2.2 Segmenta
tion And Paging In x86


Lin Addr



Dir Table offset




Phy Addr.

To locate a byte in a particular segment , a logical address must be provided . A
logical address consists of a segment selector and an offset . A selector is a unique
identifier for a segment . Among other things it provides an
offset into a descriptor table
to a data structure called a segment descriptor . A segment descriptor provides the base
address of the segment , along with the access rights and limit of the segment.

This base address is added with the offset from the log
ical address to generate a
linear address.

Now if paging is not used , the linear address space of the processor is mapped
directly into the physical address space of processor. But if the paging is used then the
bit linear address is treated as follow

31 21 11 0

Figure 2.3 Linear Address

Where the right most 10 bits select a second level
page table from the first level page
table called page directory . The next 10 bits select a page from the second level page
table and the last 12 bits are the address of the byte in the 4k size page.

2.2 How Linux does this ? :

As I said that segments

can be any size from 0 to 4GB . Linux uses two
sizes. All the segments in user space for all the processors are of 3GB , and the segments
in kernel space are of 1GB starting from 3GB. It means Linux uses a kind of
flat memory

in which all the segmen
ts in user space share the same address space. Then how
does the memory is protected in this multitasking environment , the protection at page
level is used for this purpose. In a sense Linux uses pure paging mechanism for virtual
memory management . Now l
et us consider the platform independent paging scheme of
Linux .

Page Directory Page Table


Linux makes use of a three
level page table structure consisting of the following
types of tables :

Page Directory : This is top
level node , known as PAGE GLOBAL DIRECTORY or
“pgd” .

Page M
iddle Directory : A middle level node is called PAGE MIDDLE DIRECTORY or
“pmd” .

Page Table : A bottom level node which holds the actual PTE(page table entry)
describing pages.

Since x86 provides support for only two level paging the code that traverses
“middle level “ of page tables does nothing on the x86 architecture

it gets
preprocessed and compiled down to essentially nothing via platform specific #ifdefs .
This allows other code to be written as though all machines had three

level page tab

2.3 Page Allocation :

The part of memory management which handles the allocation of pages or which
manages physical memory is called Zone Allocator . Different ranges of physical pages
may have different properties for the kernel purposes . For e
xample DMA , may only
work for physical address less than 16MB . The zone allocator handles such differences
by dividing memory into a number of zones and treating each zone as a unit for allocation
purposes .Within each zone the buddy system is used to ma
nage physical pages . Pages
are always allocated in blocks of 2

pages aligned on 2

page boundary.

2.4 Page Replacement Algorithm


The major component of the page replacement mechanism is a clock algorithm .
The clock algorithm is used because it pr
ovides an approximation of LRU replacement
and is cheaper to implement . Plus all common general purpose CPU’s have hardware
support for clock algorithm in the form of the reference bit maintained by PTE cache.

The simple clock scheme which uses only one

bit is known as “second chance”
algorithm , because it gives a page a second chance to stay in memory one more sweep

Linux uses a simple second chance (one
bit clock ) algorithm , but with several
elaborations and complications.

2.5 Kernel M
emory Allocation :

The above discussed
Buddy System

based zone allocator is a simple and
relatively fast allocator ; but it is a poor allocator in many respects . The fact that it can
only manage block sizes in powers of two means that using it straightf
orwardly requires
rounding the requested block sizes up to power of two , which can incur a large cost in
internal fragmentation .

Linux therefore uses one more memory allocator for kernel ‘s use called slab
allocator . The basic behind slab allocator i
s the concept of “object caching” , which is a
technique for dealing with objects that are frequently allocated and freed. In kernel the
small sized objects , like mutex for synchronization ,are very frequently created and
destroyed. However in many cases
the cost of initializing and destroying the objects
exceeds the cost of allocating and freeing memory for it . So the idea is to preserve the
invariant portion of an object‘s initial state
its constructed state
between uses, so it does
not have to be destr
oyed and recreated every time the object is used. This is achieved by
caching the objects in small buffers.

The slab allocator uses the zone allocator to get the largish hunks of memory and
carves them into smaller pieces as needed .

A slab consists of o
ne or more pages of virtually contiguous memory carved up
into equal size chunks , with a reference count of how many of those chunks have been

2.5.1 Slab Layout:

The contents of each slab are managed by a kmem_slab structure that maintains
he slab’s linkage in the cache , its reference count , and its list of free buffers. In turn ,
each buffer in the slab is managed by a kmem_bufctl structure that holds the freelist
linkage , buffer addresses , and a back pointer to the controlling slab. Th
is arrangement is
shown in figure.

Figure 2.4 Slab Layout

3. Windows NT Memory Management


Windows NT provides a page
based memory management scheme that
allows applications to realize a 32

bit linear address
space for 4GB of memory . Like
Linux , WNT also divides the memory in two equal parts of 2GB each . This is shown in
figure 4.1 . Like Linux the upper half of the address space is reserved for system and
lower half is for user processes. Similar to Linux
, WNT also didn’t choose the
segmented memory architecture but it implemented the pure demand paged virtual
memory system . Same discussion of how the addresses are generated on x86 architecture

4 GB



Figure 3.1 Windows NT ‘s Address Space













Reserved For

Use by

Available for
use by

can also be applied to WNT . As told the address space integrity of the process is
preserved at page levels. This is achieved in two ways . First each process has its own
directory , so that it can not access

the address space of any other process . Second
the access rights bits of the PTE can be used to protect the individual pages from being
accidentally corrupted by the process itself.

3.1 Reserved vs. Committed Memory :

In Windows NT, a distinction exis
ts between memory and address space.
Although each process has a 4
GB address space, rarely if ever will it realize anywhere
near that amount of physical memory. Consequently, the virtual
memory manager must
keep track of the used and unused addresses of a

process, independent of the pages of
memory it is actually using. In actuality this amounts to having a structure for
representing all of the physical memory in the system and a structure for representing
each process's address space.

As part of the proce
ss object (the overhead associated with every process in
Windows NT), the VMM stores a structure called the virtual address descriptor (VAD)
tree to represent the address space of a process. As address space gets used for a process,
the VMM updates the VAD

tree to reflect which addresses are used and which are not.

3.2 The Page
Frame Database:

The virtual
memory manager uses a private data structure for maintaining the
status of every physical page of memory in the system. The structure is called the
frame database
. The database contains an entry for every page in the system, as well as a
status for each page. The status of each page falls into one of the following categories:

Valid : A page in use by an active process in the system. Its PTE is marked
as valid.

Modified: A page that has been written to, but not written to disk. Its PTE is marked as
invalid and in transition.

Free : A page with no corresponding PTE and available for use. It must first be zeroed
before being used unless it is used as a re
only page.

Zeroed : A free page that has already been zeroed and is immediately available for use by
any process.

Bad : A page that has generated a hardware error and cannot be used by any process in
the system.

Most of the status types are common to mo
st paged operating systems, but the
two transitional page status types are unique to Windows NT. If a process addresses a
location in one of these pages, a page fault is still generated, but very little work is
required of the VMM. Transitional pages are m
arked as invalid, but they are still resident
in memory, and their location is still valid in the PTE. The VMM merely has to change
the status on this page to reflect that it is valid in both the PTE and the page
database, and let the process continu

Process Page Table

Page Frame Database

Figure 3.2

3.3 Page Replacement :

In Windows NT, the component responsible for making page replacement
decisions is called the
set manager
. When a proces
s starts, the VMM assigns it a
default working set that indicates the minimum number of pages necessary for the
process to operate efficiently. The working
set manager periodically tests this quota by











pages of memory from a process. If the p
rocess continues to execute
without generating a page fault for this page, the working set is reduced by one, and the
page is made available to the system.

The act of stealing a page from a process actually occurs in two stages. First, the
set man
ager changes the PTE for the page to indicate an invalid page in
transition. Second, the working
set manager also updates the page
frame database entry
for the physical page, marking it as either

, depending on whether
the page is dirty
or not.


UNIX System for Modern Architectures ; Curt Schimmel , Addison

Linux Memory Management Documentation ;



Operating Systems , Fourth Edition ; William Stallings ,Prentice Hall

Linux MM : Design of a Zone based memory allocator ; Rik Van Riel , July 1998

MSDN Library , Microsoft , Memory Management In Micros
oft Windows.