Linux Memory Management
Presented by Craig M. Grube
Why do we need it?
Compensation for limited address space
High cost of main memory, lower cost and greater abundance of secondary
Static storage evaluation became too difficult...
What does it do?
Allows processes that may not be entirely in memory to execute by means
of automatic storage allocation.
Programs logical memory is seperated from physical memory.
Data is swapped between main memory and secondary storage according
to some replacement algorithm (Paging and Segmentation)
Program must only be aware of logical memory space, and the OS may
maintain multiple levels of physical memory space.
Programs are unable to acces the memory allocated to other processes. To be
able to do so a special request must be made to the OS.
Mapper function must be very fast.
Extra record keeping (Page Table, Interted Page Table)
Challenges for Linux Kernel:
Must be suitable to small handheld devices and able to scale for enterprise
Must be able to run on a large variety of hardware
Kernel Memory Allocation
Cross Architectural Abstractions
VM Caching and Replacement
Kmalloc − Memory allocated in frame_size * 2n blocks
- takes memory from kmalloc and cuts it
into smaller pieces
Separate allocator for buffers
Assumes 3 - level page tables (Alpha uses 3, x86 uses
Complicated 1−bit clock algorithm
Kswapd - 1−bit clock algorithm that sweeps over
virtual pages, does incremental sweeps on multiple
processes, runs as a separate thread
Shm_swap - clock algorithm for System V IPC
Buffers - handled differently by separate kernel
Shrink_mmap - 'back end' clock algorithm,
sweeps over physical frames. Initiates the actual
Discussion of 2.2.x
Makes small sweep, then moves to a different process. Swaps
out related pages.
Avoids evicting a large number of pages from a given process
Multiple clocks allow for more complicated aging
- 2.4.9 (Rik van Riel VM)
3 physical zones
- 0 - 16 MB, DMA, kernel data structures, user memory
ZONE_NORMAL - 16 - 900 MB, kernel data structures, user memory
ZONE_HIGHMEM − > 900 MB, user memory
5 page uses
mapped pages - mapped in process page tables
Pagecache and swapcache - caches parts of files and/or swap space
Shared memory - SYS V or POSIX shared memory segments
Buffer cache - disk data that doesn't fit into page cache
Kernel memory - cannot be swapped out, used for page tables, slab cache, and
task structures / kernel stack.
Balanced page aging
Aim to make VM more robust under
Used to balance page aging and flushing
- pages in active use (get aged)
Inactive_dirty - pages might become
reclaimable, cleaned using page_launder.
Inactive_clean - not in active use, immediately
Makes two passes.
First moves accessed or mapped pages back
into active list, move clean unmapped pages to
Second uses asynchronous and synchronous I/O
to free pages
Complex because we don't want to start disk
I/O if not needed, don't want to wait for I/O
and don't want to start too much I/O
Need to keep track of free pages in each
Pages can be shared by many processes,
't know which ones
Balanced page eviction
2.4.10 − Present
Performs better than 2.2.x VM on desktop
systems, but fails on more systems.
RedHat ships with Rik van Riel's VM
Improved and optimized VM from 2.2.x
Kswapd looping on DMA or NORMAL
Higher performance under heavy load
Fewer swapout storms
2 queues (inactive_list and active_list)
The queues rotate at different rates.
Queue evictions are decided in a round−
robin fashion, but due to locking (for I/O
or VM critical section) some pages will
be rolled to next pass, (ends up
approximating a LRU)
2.4.x VMs are a bit overkill for handhelds.
Embedded systems with hard time bounds
should avoid VM if possible.
If VM is necessary, a simple VM is preferable.
Devices need MMUs, otherwise they must use a
2.0.x kernel (very old).
Unable to find information about special VM
for resource constrained devices.