A Compacting Real-Time Memory Management System

streambabySoftware and s/w Development

Dec 14, 2013 (3 years and 7 months ago)

143 views

A Compacting Real-Time
Memory Management System
Magisterarbeit zur Erlangung des akademischen Grades
Diplom-Ingenieur der Angewandten Informatik
Angefertigt am
Institut für Computerwissenschaften und Systemanalyse
der Naturwissenschaftlichen Fakultät
der Paris-Londron-Universität Salzburg
Eingereicht von
Bakk.techn.Hannes E.Payer
Eingereicht bei
Univ.Prof.Dr.Ing.Dipl.Inform.Christoph Kirsch
Salzburg,September 2007
Danksagung
An dieser Stelle möchte ich mich bei meinem Diplomarbeitsbetreuer Prof.Christoph
Kirsch bedanken,der mich in jeder Phase meiner Diplomarbeit ausgezeichnet betreut
hat.Vielen Dank für die hervorragende Arbeitsumgebung,all die spannenden Diskussio-
nen und die langen fruchtbaren Whiteboard-Sessions.
Sehr großer Dank gebührt meinen Eltern Eduard und Gertraud Payer,die mich mein
ganzes Leben in meinen Vorhaben und Wünschen unterstützt haben.Ohne sie wäre das
reibungslose Absolvieren meines Studiums nicht möglich gewesen.Danke,dass ihr die
besten Eltern der Welt seid.Großer Dank gilt meinen Großeltern Josef und Margarete
Eder,sowie allen weiteren Familienmitgliedern,allen voran Josef und Helga Eder.
Meiner Freundin Verena Kreilinger danke ich für die Motivation und Kraft,mit der sie
mein Leben versüßt.Es ist wunderschön so einen Menschen an seiner Seite zu haben.
Danke,dass du immer für mich da bist.
Danke an die gesamte Computational Systems Group - im besonderen an die Eurosys
Paper Crew:Silviu Craciunas,Ana Sokolova,Horst Stadler und Robert Staudinger - für
eine schöne Zeit,anregende Ideen und ihre Hilfe bei der Erstellung meiner Diplomarbeit.
Andreas Löcker danke ich für die tolle Teamarbeit all die Jahre.Weiterer Dank gilt allen
Freunden,Bekannten und Kommilitonen,die mein Leben beeinflusst haben.
I
Abstract
We introduce Compact-fit (CF),a compacting real-time memory management systemfor
allocating,deallocating,and dereferencing memory objects,which keeps the memory
always compact.CF comes with a moving and a non-moving implementation.In the
moving implementation allocation operations take constant time and deallocation comes
down from linear time to constant time,if no compaction is necessary.Allocation and
deallocation take linear time in the non-moving implementation,which depends on the
request size.Dereferencing takes constant time in both implementations.Moreover,the
system provides fully predictable memory,in the sense of fragmentation.In short,it
is a real real-time memory management system.We compare the moving and the non-
moving CF implementation with established memory allocators,which all fail to satisfy the
memory predictability requirement.The experiments confirm our theoretical complexity
bounds and demonstrate competitive performance.Furthermore,we introduce the partial
compaction strategy,which allows us to control the performance vs.fragmentation trade-
off.
II
Contents
1 Introduction 1
1.1 Outline of the thesis...............................2
1.2 Contributions...................................3
2 Real-time memory management systems 5
2.1 Memory management basics..........................5
2.2 Real-time memory management requirements................7
2.3 Explicit dynamic memory management systems...............8
2.3.1 Sequential fit...............................9
2.3.2 Doug Lea’s allocator...........................9
2.3.3 Half-fit...................................9
2.3.4 Two-level segregated fit.........................10
2.3.5 Algorithms Complexity..........................10
2.4 Implicit dynamic memory management systems................11
2.4.1 Treadmill.................................11
2.4.2 Metronome................................11
2.4.3 Jamaica..................................12
2.5 Summary.....................................13
3 Compact-fit (CF) 15
3.1 Compaction....................................15
3.1.1 Abstract and concrete address space.................16
3.2 CF API......................................19
3.3 Size-classes concept...............................20
3.4 Types of fragmentation..............................21
3.4.1 Page-block-internal fragmentation...................21
3.4.2 Page-internal fragmentation.......................22
3.4.3 Size-external fragmentation.......................23
3.4.4 Fragmentation overview.........................24
3.5 Summary.....................................25
4 CF system 26
4.1 Compaction....................................26
4.1.1 The compaction algorithm........................27
4.1.2 Complexity................................29
III
Contents
4.2 The free-list concept...............................30
4.3 Page management internals...........................32
4.3.1 Size-class list...............................33
4.3.2 Size-class reference...........................33
4.3.3 Number of used page-blocks......................34
4.3.4 Free page-blocks.............................34
4.3.5 Used page-blocks............................34
4.3.6 Memory Overhead............................38
4.4 Moving Implementation.............................38
4.4.1 Concept..................................38
4.4.2 Allocation.................................39
4.4.3 Deallocation...............................39
4.4.4 Dereferencing..............................40
4.5 Non-moving implementation...........................41
4.5.1 Concept..................................41
4.5.2 Allocation.................................43
4.5.3 Deallocation...............................43
4.5.4 Dereferencing..............................44
4.6 Total memory overhead.............................44
4.7 Partial Compaction................................45
4.7.1 Allocation.................................46
4.7.2 Deallocation...............................46
4.8 Pointer Arithmetic.................................47
4.9 Initialization....................................47
4.10 Dynamic abstract address space........................47
4.10.1 Moving implementation.........................48
4.10.2 Non-moving implementation.......................48
4.11 Arraylets......................................49
4.12 Summary.....................................49
5 Experiments 50
5.1 Test environment.................................50
5.1.1 Execution Time..............................50
5.1.2 Processor instructions..........................52
5.2 Results......................................52
5.2.1 Moving vs.non-moving implementation benchmark.........52
5.2.2 Incremental benchmark.........................57
5.2.3 Rate-monotonic scheduling benchmark................60
5.3 Fragmentation..................................63
5.4 Summary.....................................68
6 Conclusion 69
IV
Contents
6.1 CF usage guideline................................70
6.2 Future work....................................70
A Appendix 75
V
List of Figures
3.1 Memory states..................................15
3.2 Abstract address and pointer mapping.....................17
3.3 Memory object dependencies..........................18
3.4 Fragmented concrete address space......................19
3.5 Compacted concrete address space......................19
3.6 Bounded internal fragmentation p = 1/8....................22
3.7 Size-classes and different types of fragmentation...............24
4.1 Arbitrary fragmented pages of a size-class...................27
4.2 The green marked memory object becomes deallocated...........29
4.3 The size-class after applying Rule 1......................29
4.4 The green marked memory object becomes deallocated...........30
4.5 The size-class after applying Rule 2......................30
4.6 Page Layout....................................32
4.7 Used page-block list and free page-block list (next-page-block mode)....36
4.8 Used page-block list and free page-block list (free-list mode).........36
4.9 Two dimensional bitmap (16 ×32)........................37
4.10 Explicit reference of a page-block to an abstract address...........39
4.11 Memory layout of the non-moving implementation...............42
5.1 Allocation instructions benchmark.......................54
5.2 Allocation clock ticks benchmark........................54
5.3 Deallocation & compaction instructions benchmark..............55
5.4 Deallocation & compaction clock ticks benchmark...............55
5.5 Deallocation partial compaction instructions benchmark...........56
5.6 Deallocation partial compaction clock ticks benchmark............56
5.7 Incremental allocation instructions benchmark.................58
5.8 Incremental allocation clock ticks benchmark.................58
5.9 Incremental deallocation & compaction instructions benchmark.......59
5.10 Incremental deallocation & compaction clock ticks benchmark........59
5.11 Incremental deallocation partial compaction instructions benchmark....61
5.12 Incremental deallocation partial compaction clock ticks benchmark.....61
5.13 Rate-monotonic allocation instructions benchmark..............62
5.14 Rate-monotonic allocation clock ticks benchmark...............62
5.15 Rate-monotonic deallocation instructions benchmark.............64
VI
List of Figures
5.16 Rate-monotonic deallocation clock ticks benchmark.............64
5.17 Rate-monotonic deallocation partial compaction instructions benchmark..65
5.18 Rate-monotonic deallocation partial compaction clock ticks benchmark...65
5.19 Fragmentation test 1...............................66
5.20 Fragmentation test 2...............................67
5.21 Fragmentation test 3...............................67
VII
List of Tables
2.1 Allocator Complexity...............................10
4.1 Administrative Memory Overhead........................45
5.1 CFM allocation benchmark results.......................53
5.2 CFM deallocation & partial compaction benchmark results..........53
5.3 CFM incremental allocation benchmark results................57
5.4 CFM incremental deallocation & partial compaction benchmark results...60
5.5 Rate-monotonic scheduling tasks........................60
5.6 CFM rate-monotonic allocation benchmark results..............63
5.7 CFM rate-monotonic deallocation & partial compaction benchmark results.63
VIII
1 Introduction
“In the beginning,the universe was created.This made a lot of people very
angry,and has been widely regarded as a bad idea.” – Douglas Adams
This thesis introduces a new real-time memory management system,called Compact-fit
(CF).It is a compacting real-time memory management system for allocating,deallocat-
ing,and dereferencing memory objects.CF comes with a moving and a non-moving
implementation.Memory fragmentation in CF is bounded by a compile-time constant
and it is reduced by performing compaction operations.Since in CF an abstract address
space (pointer indirection) is used to reference memory objects,the resulting reference
updates caused by compaction operations are bounded:just the indirection pointer has
to be updated.
In the moving implementation allocation operations take constant time and deallocation
comes down from linear time to constant time,if no compaction is necessary.Allocation
and deallocation take linear time in the non-moving implementation,which depends on
the memory object size.Dereferencing takes constant time:one indirection in the moving
implementation and two indirections in the non-moving implementation.
Furthermore we introduce a newpointer concept:A pointer in the CF model is an address
(abstract address) and an offset.The CF model therefore supports offset-based rather
than address-based pointer arithmetics.Note that,in principle,the moving implemen-
tation may also support address-based pointer arithmetics since each memory object is
allocated in a single,physically contiguous piece of memory.In the CF model the com-
paction operations are bounded.Compaction may only happen upon freeing a memory
object and involves moving a single memory object of similar size.
Memory in the CF model is partitioned into 16KB pages.Each page is an instance
of a so-called size-class,which partitions a page further into same-sized page-blocks.
We adapt the concept of pages and size-classes from [3].A memory object is always
allocated in a page of the smallest-size size-class whose page-blocks still fit the allocation
request.Memory objects larger than 16KB are currently not supported,but we present a
suggestion how large memory objects can be handled.
The key idea in the CF model is to keep the memory size-class-compact at all times.In
other words,at most one page of each size-class may be not-full at any time while all
1
1 Introduction
other pages of the size-class are always kept full.Whenever the freeing of a memory
object leads to two pages in a size-class,which are not full,a memory object of the not-
full page is moved to take the place of the freed memory object and thus maintain the
invariant.If the not-full page becomes empty,it can be reused in any size-class - it is
moved to the pool of free pages.Using an intelligible concept of a free-list,free space
can be found in constant time,upon an allocation request.
The moving implementation of the CF model maps page-blocks directly to physically
contiguous pieces of memory,and therefore requires moving the whole memory objects
for compaction.Allocation takes constant time in the moving implementation,whereas
deallocation takes linear time if compaction occurs.
The non-moving implementation uses a block table (a virtual memory) to map page-
blocks into physical block-frames that can be located anywhere in memory.In this case,
compaction merely requires re-programming the block table rather than moving memory
objects,which provides faster compaction.However,although compaction is faster,deal-
location still takes linear time in the size of the object due to the block table administration.
For the same reason also allocation takes linear time in the non-moving implementation.
In both implementations we can relax the always-compact-requirement and allowfor more
than one not-full page per size-class.As a result the deallocation takes less time:it
reduces up to constant time.This way we formalize,control,and implement the trade-off
between timing performance and memory fragmentation.This concept is called partial
compaction.
We present the results of benchmarking both implementations on a lightweight HAL
running on bare-metal Gumstix and on Linux,as well as implementations of a non-
compacting real-time memory management systems (Half-fit [25] and TLSF [22]) and
non-real-time memory management systems (First-fit [16],Best-fit [16] and Doug Lea’s
allocator [18]) using synthetic workloads,which create worst-case and average-case sce-
narios.
1.1 Outline of the thesis
We start this thesis with a discussion of memory management systems and focus on
real-time systems.Afterwards follows the description of the CF model,followed by the
presentation of the CF implementation.In the end we present the results of the experi-
ments and benchmarks.
Chapter 1,Introduction:The introduction chapter gives an outline of this thesis and its
motivation.
Chapter 2,Real-time memory management systems:Chapter 2 gives an overview
of memory management systems and the requirements for real-time performance.The
2
1 Introduction
problemof fragmentation is introduced.Established non-real-time memory management
systems like First-fit,Best-fit,and Doug Lea’s allocator and non-compacting real-time
memory managements systems like Half-fit and Two-level segregated fit are discussed.
Finally the memory management systems of the garbage collected systems Treadmill,
Metronome,and Jamaica are presented.
Chapter 3,Compact-fit (CF):The model of CF is presented in Chapter 3.We intro-
duce the abstract and the concrete address space and present the size-classes concept.
Furthermore we state fragmentation bounds for the size-class concept.
Chapter 4,CF system:The systemof CF is discussed in Chapter 4.There are two differ-
ent CF approaches:the moving and the non-moving implementation.Both are examined
in detail.Their asymptotic complexity and memory overhead are presented.Then,the
partial compaction concept is explained,which brings deallocation down to constant time
for the moving implementation of CF.Additionally,we discuss extensions and optimiza-
tions of the current CF implementations.
Chapter 5,Experiments:Chapter 5 presents the experiments and benchmarks.Three
different mutators are used,which generate synthetic worst-case and average-case sce-
narios.The performance of both CF implementations are measured using clock ticks and
instructions benchmarks.The results are compared with the results of First-fit,Best-fit,
Doug Lea’s allocator,Half-fit,and TLSF.In the end of this chapter we present fragmenta-
tion tests,where we compare the CF moving implementation with TLSF.
Chapter 6,Conclusion:In the last chapter we conclude the thesis.We present a review
of the thesis and outline its main contributions.Finally we discuss future work ideas.
Appendix A,CF implementation:The appendix lists the source code of the CF moving
and non-moving implementation.Note that the source code of both implementations is
merged.The desired approach is compiled by setting the respective flag.The implemen-
tation is available under the GPL at  .
1.2 Contributions
The contribution of this thesis is the CF model and the concept of a predictable memory
(predictable fragmentation).Based on the CF model we implemented the moving and
the non-moving CF approach.Furthermore,we presented the partial compaction strat-
egy and implemented it for the CF moving implementation.Moreover,we benchmarked
the moving and non-moving implementations as well as the partial compaction strategy
of CF in a number of experiments and compared both CF implementations with the ex-
plicit dynamic memory management algorithms First-fit,Best-fit,Doug Lea’s allocator,
Half-fit,and TLSF.We used two different platforms for the benchmarks:Gumstix run-
ning a lightweight hardware abstraction layer to perform execution time measurements
3
1 Introduction
(processor cycles) and the Linux operating systemto performprocessor instruction tests.
In addition we performed fragmentation experiments,where we compare the memory
utilization of the CF moving implementation with TLSF.
4
2 Real-time memory management systems
“Developers of real-time systems avoid the use of dynamic memory manage-
ment because they fear that the worst-case execution time of dynamic mem-
ory allocation routines is not bounded or is bounded with an excessively large
bound.” – Isabelle Puaut
This chapter starts with an overview of memory management in general and the re-
quirements for real-time performance in particular.Memory fragmentation is a problem
in managing memory,which we discuss as well in this section.Afterwards dynamic
memory management systems are examined.We discuss established allocator strate-
gies like First-fit,Best-fit,Doug Lea’s allocator,Half-fit,and Two-level segregated fit and
present the memory management systems of the garbage collected systems of Treadmill,
Metronome,and Jamaica.
2.1 Memory management basics
By memory management we mean dynamic memory management.Dynamic memory
management is a fundamental and well studied part of operating systems.This core
unit has to keep track of used and unused parts of the memory.Applications use the
dynamic memory management system to allocate and free memory objects of arbitrary
size in arbitrary order.This is what makes the memory management dynamic.Moreover
applications can use the memory management system for accessing already allocated
memory objects.This operation is called dereferencing.In programming languages like C
the memory management systemdoes not handle memory dereferencing.An application
can access the whole contiguous memory directly,In contrast to virtual machines like
the Java virtual machine,which calls explicit dereferencing methods to gain access to
the memory.The memory management system responds to an allocation request by
providing an available memory slot,it responds to a deallocation request by freeing the
occupied memory slot,and it responds to a dereferencing request by providing access
to a memory location within the allocated memory object.Memory deallocation can lead
to memory holes,which can not be reused by future allocation requests,if they are too
small.Dynamic memory management systems have to minimize this problem,called
the fragmentation problem.The complexity of allocation amounts to finding free memory
5
2 Real-time memory management systems
space and it increases with the increase of fragmentation.The complexity of deallocation
may also be related to the fragmentation problem.Hence,fragmentation is a key issue in
memory management.As usual in the literature,we use the term fragmentation for both
the phenomenon of fragmented memory space and for the size of fragmented parts of
the memory.
In general there are two types of fragmentation (an introduction to fragmentation is given
in [34]):
• Internal fragmentation occurs if the memory is partitioned into blocks.Allocations
that do not fit a whole block waste the memory at the end of the block.This wasted
memory is called internal fragmentation.
• External fragmentation is a phenomenon in which the contiguous memory becomes
divided into many small pieces over time,which are not usable by further allocation
requests.
Johnstone [15] showed that a large class of programs tends to perform many allocation
operations of small and equal size.The majority of programs consist of just a few key
objects that are recently used and make up the nodes of large data structures upon which
the program operates.The remaining allocation operations belong to strings,arrays and
buffers,which can be of varying and larger size.Johnstone concludes that fragmentation
can be ignored if the right allocation strategy for an application is chosen.This might be
true for short-running userland programs but for safety critical systems this arguments
does not hold.For hard real-time systems the worst-case scenarios have to be taken
care of and fragmentation has to be considered.
A way to fight fragmentation is by performing compaction,also known as defragmen-
tation:Initially the free memory space is contiguous.The fragmentation results in a
non-contiguous free memory space.Compaction is a process of rearranging the used
memory space so that larger contiguous pieces of memory become available.In the best
case the whole free memory becomes contiguous again.
There are two types of dynamic memory management systems:
• explicit,in which an application has to explicitly call the corresponding procedures of
the dynamic memory management system for allocating and deallocating memory,
and
• implicit,in which memory deallocation is implicit,i.e.,allocated memory that is
not used anymore is detected and freed automatically.Such systems are called
garbage collected systems.
The explicit dynamic memory management systems usually cover low level implementa-
tions in comparison to the implicit dynamic memory management systems that can,for
example,manage Java real-time systems.Therefore they are in a way incomparable,but
Berger et.al [13] introduced an experiment methology for quantifying the performance of
6
2 Real-time memory management systems
garbage collection vs.explicit memory management.
In this work we propose an explicit dynamic memory management system,which can be
used for both low and high level implementations.
2.2 Real-time memory management requirements
Traditional dynamic memory management strategies are typically non-deterministic.Most
of themare optimized to offer excellent best and average-case response times,but in the
worst-case they are unbounded.This is suitable for non real-time systems,but for hard
real-time systems tight bounds have to exist.Therefore,dynamic memory allocators have
been avoided in the real-time domain.The used memory of real-time applications was
typically allocated statically,a sufficient solution for many real-time controllers.Nowadays
real-time applications have increasing complexity which in turn demands greater flexibility
of memory allocation.Therefore,there is a need of designing dynamic real-time memory
management systems.
In an ideal dynamic real-time memory management systemeach unit operation (memory
allocation,deallocation,and dereferencing) takes constant time.We refer to this time as
the response time of the operation of the memory management.If constant response
times can not be achieved,then bounded response times are also acceptable.However,
the response times have to be bounded by the size of the actual request and not by the
global state of the memory.
More precisely,real-time systems should exhibit predictability of response times and of
available resources.
If the response times are bounded,then they are predictable.The fragmentation prob-
lem affects the predictability of the response times.Consider the following example.The
memory consists of n blocks of equal size.An application allocates all of the n blocks and
then deallocates each second block.As a result,50% of the memory is free.Neverthe-
less,any allocation request demanding at least two contiguous blocks can not be served.
Depending on how the memory management resolves fragmentation,this situation can
have an effect on the response times.For example,if moving objects in order to create
enough contiguous space is done upon an allocation request,then the response time of
allocation is no longer bounded,i.e.,it may depend on the global state of the memory.
Predictability of available memory means that the number of the actual allocations to-
gether with their sizes determines howmany more allocations of a given size will succeed
before running out of memory,independent of the allocation and deallocation history.In
a predictable system also the amount of fragmentation is predictable and depends only
on the actual allocations.In addition to predictability,fragmentation has to be minimized
for maximal utilization of the available memory.
7
2 Real-time memory management systems
None of the established explicit dynamic memory management systems meets the re-
quirement on predictability of available memory,since the fragmentation depends on the
allocation and deallocation history.
As mentioned above,a way to minimize fragmentation is by performing compaction.The
compaction workload has to be evenly and fairly distributed to get predictable response
times.Compaction operations can be done in either event- or time-triggered manner:
• “In an event-triggered system a processing activity is initiated as a consequence
of the occurrence of a significant event."– [17] Compaction is initiated upon the
occurrence of significant events e.g.memory management API calls and can be
performed before or after a memory management API call,where m memory ob-
jects are moved to another location in memory.
• “In a time-triggered system,the activities are initiated periodically at predetermined
points in real-time.”– [17] Compaction operations are performed every n clock ticks,
independent of memory management API calls,where m memory objects are
moved to another location in memory.
The memory management system that we propose has bounded (constant) response
times and predictable available memory,where fragmentation is minimized.The second
issue is achieved via compaction.Compaction is performed in event-triggered manner.
2.3 Explicit dynamic memory management systems
The procedures for allocating ( ) and deallocating ( ) memory have to be called
explicitly,if an explicit dynamic memory management system is used.Wilson et al.[37]
gives an survey of dynamic memory allocation strategies.Masmano et al.[24] and
Puaut [29] present in their works evaluations of explicit dynamic memory management
systems under real-time loads.We give in this section a brief overview of some estab-
lished allocators:First-fit,Best-fit,Doug Lea’s allocator,Half-fit,and Two-level segregated
fit.Note that these allocators operate on a single contiguous piece of memory.
The fragmentation problem is not explicitly handled by these explicit dynamic memory
management systems.This means that memory compaction is not performed.The
algorithms try to align the allocated memory objects in a more or less optimal manner
in the contiguous memory.The usable memory depends on the allocation/free history
of the application and is therefore not predictable.This is unacceptable for safety-critical
hard real-time systems,where fragmentation guarantees are needed.
8
2 Real-time memory management systems
2.3.1 Sequential fit
First-fit and Best-fit are sequential fit allocators.[16,34] give a detailed explanation of
these algorithms.Sequential fit allocators are based on a single or double linked list
of free memory blocks.The pointers of the free list are embedded in the free blocks.
Therefore no memory is wasted.
The First-fit allocator searches the free list and takes the first free block that fits the
allocation request.The allocation request has to be smaller or equal than the size of the
free block.
The Best-fit allocator scans the whole list and selects the free block which best fits the
allocation request.
It is obvious that these algorithms with its allocation strategies are not real-time.Consider
a memory constellation where the whole memory of size m consists of allocated blocks
of minimumsize s and every odd block is free.In this case the free list has the maximum
size,i.e.in the worst case the whole free list has to be examined to fulfill an allocation
request.The maximumnumber of list iterations is
m
2s
.Deallocating a used memory object
takes constant time.
2.3.2 Doug Lea’s allocator
Doug Lea’s allocator [18] is a hybrid allocator,which is widely used in several environ-
ments,e.g.in some versions of Linux [35].It uses two different types of free lists:There
are 48 free lists of the first free list type,which represent exact block sizes (from16 to 64
bytes),called fast bins.The remaining free lists (the second type) are segregated free
lists,called bins.Allocation operations are handled by the corresponding free list that fits
the allocation request.The allocator uses a delayed coalescing strategy.This means that
neighboring free blocks are not coalesced after deallocation operations.Instead a global
block coalescing is done,if an allocation requests can not be fulfilled.Therefore deallo-
cation operations are fast and performin constant time,but the allocation operations offer
imprecise bounds,caused by the global delayed free blocks coalesce operations that can
occur.Let m denote the memory size and s denote the minimum block size,then O(
m
s
)
represents the complexity of coalesce operations that can occur,if an allocation call can
not be performed.Therefore Dough Lea’s allocation strategy is not predictable and not
suitable for a hard real-time system.
2.3.3 Half-fit
Half-fit [25] groups free blocks in the range of [2
i
,2
i+1
[ into a free list denoted by i.
Bitmaps are used to keep track of empty lists and bitmap processor instructions are used
9
2 Real-time memory management systems
to find set bits in constant time.If an allocation request of size s is performed,the search
for a suitable free block starts at i,where i = ￿log
2
(s − 1)￿ + 1 or 0 if s = 1.If list i
contains no free element,then the next free list i + 1 is examined.If a free block of a
larger size class has to be used,this free block is split into two blocks of sizes r and r
￿
,
where the free block r
￿
is reinserted into the corresponding free list.Masmano et.al.[22]
showed that fragmentation is high in the Half-fit allocator.Especially if many allocations
are performed that are not close to a power of two.
2.3.4 Two-level segregated fit
The two-level segregated fit (TLSF) allocator [22],which is used in the RTLinux/GPL
system [23],implements a combination of a segregated free list and a bitmap allocation
strategy.The first dimension of the free list is an array that represents size classes that
are a power of two apart.The second dimension sub-divides each first-level size class
linearly.Each free list array has an associated bitmap where free lists are marked that
contain free blocks.Processor instructions are used to find an adequate free memory
location for an allocation request in constant time.If there are neighboring free blocks
after a deallocation operation,then they are immediately coalesced using the boundary
tag technique [16].Each used block contains 8 byte administration information,which
are stored in the header of the block.The first 4 bytes hold the size of the used block
and the second 4 bytes contain a physical memory reference to the previous block,with
respect of the linear order of blocks in memory.This information is necessary to perform
block coalescing in constant time.The immediate coalescing technique leads to larger
reusable memory ranges and therefore to less fragmentation in comparison to the Half-fit
approach.Since the minimal block size in TLSF is 16 bytes,the worst case administration
memory overhead is high:
8
16
= 50%.
2.3.5 Algorithms Complexity
Table 2.1 shows the complexity of the allocation and deallocation operations of the pre-
sented explicit dynamic memory management systems.Half-fit and TLSF are the only
allocators that offer bounded (constant) time behaviour for both operations.
Allocation Deallocation
First-fit O(
m
2s
) O(1)
Best-fit O(
m
2s
) O(1)
DLmalloc O(
m
s
) O(1)
Half-fit O(1) O(1)
TLSF O(1) O(1)
Table 2.1:Allocator Complexity
10
2 Real-time memory management systems
2.4 Implicit dynamic memory management systems
An implicit dynamic memory management system is in charge of collecting allocated
memory objects that are not in use anymore.Implicit dynamic memory deallocation is
known as garbage collection.The garbage collector is responsible for deallocating suffi-
cient unused allocated memory objects to handle prospective allocation request of arbi-
trary size.We do not focus on the garbage collection strategies in this section,we are
only interested in the memory management concepts of the real-time garbage collected
systems.
We examine the following established real-time garbage collected systems:the Treadmill
concept [6] with its modifications [36,19],the time-triggered Metronome [4,2,5,3],and
the event-triggered Jamaica [33,31,32] approach.The last two are commercial systems.
Ritzau [30] presents an extensive overview in his dissertation.
2.4.1 Treadmill
Baker’s Treadmill [6] is a real-time,non-copying garbage collector that offers bounded
response time for allocation operations.The garbage collection strategy is a four-color
collection scheme.Details about this approach can be found in [6].The algorithmuses a
single block size.One free block has to be taken fromthe free list to handle an allocation
request.All memory blocks are stored in circular doubly-linked lists.Therefore memory
allocation is done in constant time.Using just one block size is very restrictive and results
in high internal fragmentation.The main drawback of this approach is that unpredictable
large amounts of garbage collection work can occur.
Wilson [36] introduced segregated free lists for this algorithm,with size classes increasing
in powers of two.Allocation requests are handled by the free list that fits the allocation
request.Each list is collected separately by the Treadmill collector.A collection occurs
only if a free list becomes empty.This strategy is unpredictable and therefore not suitable
for a hard real-time system.
A page-level memory management version of the Treadmill collector is proposed in [19],
which improves memory utilization without imposing unbounded response times for al-
location requests.A page remapping scheme is used to create larger free contiguous
pieces of memory.
2.4.2 Metronome
In Metronome [4,2,5,3],allocation is performed using segregated free lists.The whole
memory is divided into pages of equal size.Each page itself is divided into fixed-size
11
2 Real-time memory management systems
blocks of a particular size.There are n different block sizes which lead to n different size-
classes.All pages that consist of blocks of the same size build up a size-class.Allocation
operations are handled by the smallest size-class that can fit the allocation request.This
is done in constant time.Unused pages can be used by any size-class.
Compaction operations are performed,if pages of a size class become fragmented to
a certain degree due to garbage collection.First of all,the pages of a size class are
sorted by the number of unused blocks per page.There is an allocation pointer,which
is set to the first not-full page of the resulting list and a compaction pointer,which is set
to the last page of the resulting list.Allocated objects are moved from the page,which is
referenced by the compaction pointer to the page,which is referenced by the allocation
pointer.Compaction is performed until both pointers reference the same page.
Relocation of objects is achieved using a forwarding pointer.This pointer is located in the
header of each object.A Brooks-style read barrier [8] maintains the to-space invariant.
A mutator always sees its objects in to-space.A number of optimizations are applied to
the read barrier to reduce its cost,e.g.,barrier-sinking (the barrier is sinked down to its
point of use).The mean cost of the read barrier is 4%.In the worst case it represents an
overhead of 9.6%.
Since Metronome is a time-triggered real-time garbage collector,compaction is part of
the collection cycles,which are performed at predefined points in time.It is shown that
compaction takes no more than 6% of the collection time.Therefore the compaction
overhead is bounded in the Metronome approach.The remaining time is used to detect
allocated objects that are not in use anymore.The duration of the collection interval has
to be preset application specific in advance.An improper choice of the duration of the
collection interval could lead to missed deadlines or out of memory errors.
2.4.3 Jamaica
Siebert presented the Jamaica [33,31,32] real-time garbage collector,which does not
perform compaction operations.A new object model is introduced that is based on fixed
size blocks.The whole memory is subdivided into blocks of equal size.Small allocation
request can be satisfied by using a single block.Larger ones require a possibly non-
contiguous set of blocks,where each block holds a reference to its successor.The non-
contiguity is the reason,why compaction is not necessary anymore.Objects can be build
up by arbitrarily distributed blocks,which are connected by a singly-linked list or a tree
data structure.
When using blocks of fixed size,the most important decision is to choose an adequate
block size.Siebert proposed block sizes in the range of 16 to 64 bytes.This parameter
has to be chosen program specific.
The complexity of allocation and deallocation operations depends on the size of the af-
12
2 Real-time memory management systems
fected object and the used block size.Let s denote the size of an object and let b denote
the block size.An object of size s requires n = ￿
s
b
￿ blocks.This means that if an allo-
cation or deallocation operation of an object of size s is performed,n list operations are
required.Therefore allocation and deallocations operations are performed in linear time
O(
s
b
),depending on the object size.
Memory dereferencing can not be done in constant time using the object model of Ja-
maica.Since an object is build up of non-contiguous blocks,access to the last field of an
object requires going through all the blocks of the object,if they are connected via a linked
list.Therefore memory dereferencing takes linear time and depends on the location of
the field in the object.
Jamaica performs event-triggered garbage collection,which is executed when allocation
operations are performed.mblocks have to be examined at every allocation operation to
guarantee that all allocation request can be fulfilled.In Jamaica,the amount mof blocks
that have to be checked depends on the total amount of allocated blocks.If there are
only a few allocated blocks,then less collection has to be performed.Otherwise more
work has to be done.In the worst case,if the memory is completely full and an allocation
operation is performed,all allocated objects have to be checked.Therefore the collection
overhead varies and depends on the global memory state.
2.5 Summary
Two of the presented explicit dynamic memory management systems offer bounded (con-
stant) response times for allocation and deallocation operations:Half-fit and TLSF.Since
TLSF handles fragmentation better than Half-fit,it is the most applicable candidate for
real-time systems.The main problem,which is not considered by these systems,is that
fragmentation is unpredictable and may be unbounded.Therefore,scenarios that lead to
high fragmentation are possible.As a result,using these systems for real-time applica-
tions may be problematic.The memory has to be predictable.
The Treadmill approach with its modifications presents some interesting memory man-
agement layouts,but it suffers fromunpredictable collection cycles.Metronome performs
time-triggered garbage collection.The duration of the collection interval,where com-
paction is performed,has to be precisely chosen to guarantee that the real-time system
is able to meet its deadlines and that sufficient memory is always available.Otherwise the
system may fail.The event-triggered Jamaica system uses an object model that avoids
external fragmentation.Here internal-fragmentation is the problem that has to be mini-
mized by choosing an application adapted block size.The garbage collection overhead
varies and depends on the global memory state what degrades predictability of the sys-
tem.A further drawback is that memory dereferencing can not be performed in constant
time in Jamaica.
13
2 Real-time memory management systems
Memory management is the basis for a predictable hard real-time system.None of the
outlined explicit dynamic memory management systems offers predictable memory op-
erations in combination with explicit fragmentation elimination and independence of the
global memory state.Hard real-time systems require dynamic memory management
systems that offer all of these properties.
14
3 Compact-fit (CF)
“Simplicity is prerequisite for reliability.” – Edsger W.Dijkstra
The following chapter describes the model of the compacting real-time memory manage-
ment systemCompact-fit (CF) and presents the main design decisions.It abstracts from
any data-related aspects such as data organization and administration.
Different element colors in status diagrams are used to represent the memory states of
the CF entities:
(a) unused
(b) used
(c) internal
fragmentation
(d) page-
internal frag-
mentation
Figure 3.1:Memory states
Furthermore the abstract term memory object is used to describe an allocated memory
range.
3.1 Compaction
Compaction can be used to bound fragmentation.However,a stop-the-world approach,
where the whole memory is compacted at once would degrade predictability and is not
suitable for a real-time system.Therefore the compaction workload has to be distributed
incrementally and fairly over memory operations to get a predictable timing behaviour.
The Jamaica System [32] presented in Section 2.4.3 is an exception.The object model
eliminates external fragmentation.Hence compaction is not necessary anymore.How-
ever,there the workload is not removed.It is shifted to memory dereferencing where
more work has to be done for the rear memory object elements.In CF dereferencing
operations have to be done in constant time,therefore the Jamaica object model does
not fit our claim.
15
3 Compact-fit (CF)
The compaction workload normally consists of two major tasks:copying memory objects
and updating all references that point to the moved memory object [10].Copying memory
objects can be bounded by the size of the memory object,but the amount of reference up-
dates is unpredictable.In the worst-case n allocated memory objects hold a reference to
the moved memory object,which would lead to n reference updates.Furthermore this n
references have to be found in memory.Predictability can be achieved if memory objects
and direct references to memory objects are decoupled.The decoupling mechanism is
described in the following section.
3.1.1 Abstract and concrete address space
Conceptually,there are two memory layers:the abstract address space and the concrete
address space.Allocated memory objects are physically placed in contiguous portions of
the concrete address space.For each allocated memory object,there is exactly one en-
tity of the abstract address space.No direct references from applications to the concrete
address space are possible:an application references the abstract address of a memory
object,which furthermore uniquely determines the memory object position in the con-
crete address space.Therefore the applications and the memory objects (in the concrete
address space) are decoupled.All memory operations operate on abstract addresses.
We start by defining the needed notions and notations.
Definition 1 The abstract address space is a finite set of integers denoted by A.
Definition 2 An abstract address a is an element of the abstract address space,a ∈ A.
Definition 3 The concrete address space is a finite interval of integers denoted by C.
Note that since it is an interval,the concrete address space C is contiguous.Moreover,
both the concrete and abstract address spaces are linearly ordered by the standard or-
dering of the integers.
Definition 4 A concrete address c is an element of the concrete address space,c ∈ C
Definition 5 A memory object is an element of the set of memory objects i ∈ I(C).For
each memory object,two elements of the concrete address space c
1
,c
2
∈ C,such that
c
1
≤ c
2
,define its range,i.e.we have i = [c1,c2] = {x|c
1
≤ x ≤ c
2
}.
As mentioned above,each abstract address refers to a unique range of concrete ad-
dresses,which represents a memory object.Vice versa,the concrete addresses of an
allocated memory object are assigned to a unique abstract address.To express this
formally we define a partial map that assigns to each abstract address,the interval of
concrete addresses that it refers to.
The abstract address partial map address:A ￿→I(C) maps abstract addresses to mem-
ory objects.We say that an abstract address a is in use if address(a) is defined.The
16
3 Compact-fit (CF)
abstract address map is injective,i.e.,different abstract addresses are mapped to differ-
ent subintervals,and moreover for all abstract addresses a
1
,a
2
∈ A that are in use,if
a
1
￿= a
2
,then address(a
1
) ∩address(a
2
) = ∅.
Accessing a specific element in the concrete address space C requires two pieces of
information:the abstract address a and an offset o,pointing out which element in the
memory object m= address(a) is desired.Therefore the next definition:
Definition 6 An abstract pointer denoted by a
p
is a pair a
p
= (a,o),where a is an abstract
address in use and o is an offset,o ∈ {0,...,| address(a)| − 1}.By |.| we denote the
cardinality of a set.
Definition 7 The abstract pointer space is the set of all abstract pointers a
p
,and it is
denoted by A
p
.
There is a one-to-one correspondence between A
p
and C.Each abstract pointer a
p
refers to a unique concrete address c via the abstract pointer mapping pointer:A
p
→C.
It maps an abstract pointer a
p
= (a,o) to the concrete address of the memory object
m= address(a) that is at position o with respect to the order on address(a).
These definitions and mappings are clarified by an example.Let the abstract address
space A consist of 3 elements A = {1,2,3} and the concrete address space C con-
sists of 10 elements C = {1,2,...,10}.Assume that three memory objects of differ-
ent size (different amount of concrete addresses) are allocated:address(1) = [2,3],
address(2) = [6,7] and address(3) = [8,10].The abstract addresses together with their
offsets create abstract pointers,which are mapped to C.For example,pointer(1,1) = 3
and pointer(3,2) = 10.Figure 3.2 depicts this situation.
Figure 3.2:Abstract address and pointer mapping
The following examples are more concrete in the sense of implementation and show the
benefit of an abstract address space A.Consider an application that allocates memory
objects and holds references to A,which is realized by a contiguous pointer indirection
table.In the examples,the pointer indirection table is called proxy table.The proxy table
17
3 Compact-fit (CF)
entries refer to the concrete address space C,the real memory.
Figure 3.3 illustrates how dependencies of memory objects are handled.Large data-
structures often consist of a number of allocated memory objects connected via refer-
ences (e.g.linked-lists,trees,...).Compaction operations lead to reference updates in
this data-structures.The amount of reference updates is unpredictable,if this references
are direct.Therefore each memory reference has to be indirect,i.e.,each reference is an
abstract pointer.This situation is shown in figure 3.3.
Figure 3.3:Memory object dependencies
Indirect referencing provides predictability of the reference updates during compaction.If
fragmentation occurs,the concrete address space C gets compacted and the references
from the abstract address space A to the concrete address space C are updated,as
shown in Figure 3.4 and 3.5.Hence,objects are moved in C and references are updated
in A.The number of reference updates is bounded:movement of one memory object in
C leads to exactly one reference update in A.In contrast,direct referencing (related to
object dependencies) implies unpredictable number of reference updates.This is why we
chose for an abstract address space design.
18
3 Compact-fit (CF)
Figure 3.4:Fragmented concrete address space
Figure 3.5:Compacted concrete address space
3.2 CF API
The CF systemprovides three explicit memory operations.The implementations of these
functions are discussed in Section 4.Now we just describe them in an abstract way:
•  size is used to create a memory object of a given size.It takes an integer
size > 0 as argument and returns an abstract pointer a
p
= (a,o),where a is an
abstract address that references to the allocated memory object and the offset o is
set to 0,the beginning of the memory object.
•  a takes an abstract address a as argument and frees the memory object that
belongs to this abstract address.The abstract address mapping is released.
•  a
p
 returns the concrete address c of an abstract pointer a
p
= (a,o),
where a is the abstract address of a memory object and the offset o points to the
actual position within the memory object.
19
3 Compact-fit (CF)
Note that the abstract address of an allocated memory object never changes until the
memory object gets freed.The concrete address(es) of an allocated memory object may
change due to compaction operations.
To this end,we point out another difference between the abstract and the concrete ad-
dress space.Over time,they may both get fragmented.The fragmentation of the concrete
space represents a problemsince upon allocation request the memory management sys-
tem must provide a sufficiently large contiguous memory range of a given size.In the
case of the abstract address space,a single address is used per memory object,inde-
pendent of its size.Hence,upon an allocation request,the memory management system
needs to merely find a single unused abstract address.We can achieve this within a
constant time bound,without keeping the abstract address space compact.Details of the
implementation follow in Chapter 4.
3.3 Size-classes concept
The size-classes concept of Metronome [3] was chosen to administrate the concrete
address space:
• Pages:The memory is divided into units of a fixed size P,called pages.For exam-
ple,in our implementation each page has a size P = 16KB.
• Page-blocks:Each used page is subdivided into page-blocks.All page-blocks in
one page have the same size.In total,there are n predefined page-block sizes
S
1
,...,S
n
where S
i
< S
j
for i < j.Hence the maximal page-block size is S
n
.
• Size-classes:Pages are grouped into size-classes.There are n size-classes (just
as there are n page-block sizes).Let 1 ≤ i ≤ n.Each page with page-blocks of
size S
i
belongs to the i-th size-class.Furthermore,each size-class is organized as
a doubly-circularly-linked list.
Every allocation request is handled by a single page-block.When an allocation request
 size arrives,we determine the best fitting page-block size S
i
and insert the
memory object into a page-block in a page that belongs to size-class i.The best fitting
page-block size is the unique page-block size S
i
that satisfies S
i−1
< size ≤ S
i
.
A page that becomes deallocated is removed from its size-class and moved to the pool
of free pages.It can be reused in any possible size-class again.
Figure 3.7 at the end of this chapter shows an exemplary view of the organization of
the concrete address space:There are three size-classes,in one of them there are two
pages,in the other two there is a single page per class.
20
3 Compact-fit (CF)
3.4 Types of fragmentation
The size-classes concept leads to four different types of fragmentation:
• Page-block-internal fragmentation
• Page-internal fragmentation
• Size-external fragmentation
We will briefly discuss each of them and the impact they have on our design decisions.
The properties are similar as for Metronome [3].Figure 3.7 at the end of this chapter
shows the different types of fragmentation as well.
3.4.1 Page-block-internal fragmentation
Page-block-internal fragmentation is the unused space at the end of a page-block.Given
a page p in size-class i (the page-blocks in p have size S
i
),let b
j
for j = 1,...,B
p
be the
page-blocks appearing in the page p,where B
p
= P div S
i
.For a page-block b
j
we define
used(b
j
) = 1 if b
j
is in use,and used(b
j
) = 0 otherwise.We also write data(b
j
) for the
amount of memory of b
j
that is allocated.The page-block-internal fragmentation for the
page p is calculated as
F
B
(p) =
B
p
￿
j=1
used(b
j
) ∙ (S
i
−data(b
j
)).
The used memory of the page is
U
M
(p) =
B
p
￿
j=1
used(b
j
) ∙ data(b
j
).
One can also calculate the total page-block-internal fragmentation in the memory by sum-
ming up the page-block-internal fragmentation for each page.Let p
j
for j = 1,...,P
p
be
the pages of the system and used(p
j
) = 1 if p
j
is in use.The total page-block-internal
fragmentation is:
TF
B
=
P
p
￿
j=1
F
B
(p
j
) ∙ used(p
j
).
The total used memory of all pages is
TU
M
=
P
p
￿
j=1
U
M
(p
j
) ∙ used(p
j
).
21
3 Compact-fit (CF)
The relative page-block-internal fragmentation of a page is
f
B
(p) =
F
B
(p)
U
M
(p)
and the relative page-block-internal fragmentation of all used pages is
Tf
B
=
TF
B
TU
M
.
The total page-block-internal fragmentation can be bounded by a factor f if the page-
block sizes are chosen carefully.Berger [7] suggests the following ratio between adjacent
page-block sizes:
S
k
= ￿S
k−1
(1 +f)￿ (3.1)
for k = 2,..,n.The size of the smallest blocks S
1
and the parameter f can be chosen
programspecific.Bacon et al.[3] proposed a factor f = 1/8.Such a factor leads to minor
size differences for the smaller size-classes and major differences for larger size-classes.
Figure 3.6 shows this property (S
1
= 32).
Figure 3.6:Bounded internal fragmentation p = 1/8
The example in Figure 3.7 shows the occurrence of internal fragmentation.
3.4.2 Page-internal fragmentation
Page-internal fragmentation is the unused space at the end of a page.If all possible
block sizes S
1
,...,S
n
are divisors of the page size P,then there is no page-internal
fragmentation in the system.However,if one uses the formula (3.1) for the choice of
22
3 Compact-fit (CF)
block sizes,then one also has to acknowledge the page-internal fragmentation.For a
page p in size-class i,it is defined as
F
P
(p) = P mod S
i
.
Let p
j
for j = 1,...,P
p
be the pages of the system.The total page-internal fragmentation
is the sum of F
P
(p) taken over all used pages p is
TF
P
=
P
p
￿
j=1
F
P
(p
j
) ∙ used(p
j
).
The relative page-internal fragmentation of a page is
f
P
(p) =
F
P
(p)
U
M
(p)
and the relative page-internal fragmentation of all used pages is
Tf
P
=
TF
P
TU
M
.
Page-internal fragmentation is also shown in Figure 3.7.
3.4.3 Size-external fragmentation
Size-external fragmentation measures the unused space in a used page.This space is
considered fragmented or “wasted” because it can only be used for allocation requests
in the given size-class.For example,let p be a page in size-class S
i
= 32B.If only
one block of p is allocated,then there are P−32B memory unused of this page.If no
more allocation requests arrive for this size-class,then this unused memory can never
be used again.In such a situation,an object of size 32B consumes the whole page.The
size-external fragmentation of a page p in size-class i is bounded by
F
S
(p) = P −S
i
.
In general the total size-external fragmentation in the system is bounded by the sum of
F
S
(p),over all pages.
In our system at most one page per size-class will be in use but not full.Therefore,the
size-external fragmentation of the whole size-class is the size-external fragmentation of
only one page.Details about this property are explained in the next chapter.
23
3 Compact-fit (CF)
Given that there are n predefined page-block sizes S
1
,...,S
n
and a page is of size P,
the total size-external fragmentation is bounded by
TF
S
=
n
￿
i=1
P −S
i
.
The more size-classes there are in the system,the less block-internal fragmentation oc-
curs,but therefore the size-external fragmentation grows.Hence,there is a trade-off
between block-internal and size-external fragmentation,which must be considered when
designing the size-classes.
For example,the free page-blocks of size-class 3 in Figure 3.7 represent size-external
fragmentation.
3.4.4 Fragmentation overview
Figure 3.7 shows an exemplary view of the organization of the concrete address space:
There are three size-classes:in one of them there are two pages,in the other two there
is a single page per class.
Figure 3.7:Size-classes and different types of fragmentation
24
3 Compact-fit (CF)
3.5 Summary
In this chapter we presented the size-class concept of CF and introduced the abstract
and the concrete address space.Furthermore,we defined formal mappings of abstract
addresses to concrete addresses.In the end of this chapter we elaborated a theoreti-
cal analysis of the different types of fragmentation and presented the bounds for these
types of fragmentation.Having such bounds results in predictable fragmentation for our
memory management system.
25
4 CF system
“We used to dream about this stuff.Now,we get to build it.It’s pretty neat.” –
Steve Jobs
This chapter consists of the following parts:First we describe the compaction algorithmof
CF.Afterwards the free-list concept is presented and the details of the concrete address
space management,i.e.,administration of the pages and size-classes are discussed.
Next we present two implemented versions of CF:the moving and the non-moving imple-
mentation.Both implementations are written in C and assembler for the IA-32 and ARM
architecture.Furthermore,we do a complexity analysis for each CF entity and shown
that their response times are bounded.
4.1 Compaction
As described in Section 2.2 in CF compaction is performed in an event-triggered manner.
The memory has to be in a consistent state after a compaction operation,e.g.,a compact
state.Therefore each compaction operation has to make sufficient progress to keep the
memory compact.Furthermore,the response times of the compaction operations must
be bounded to satisfy the real-time requirements.
Consider the example in Figure 4.1.It shows a size-class,which consists of three pages.
Each page is fragmented to a certain degree.Compacting these pages within tight time
bounds it is not possible.The fragmented state of these pages was needlessly produced,
which degrades predictability.Furthermore consider the worst case,where every page
just contains one allocated memory object.Such situations have to be avoided.
The key idea in the CF model is to keep the memory size-class-compact at all times.In
other words,at most one page of each size-class may not be full at any time while all
other pages of each size-class are always kept full.The compaction algorithm of CF be-
haves as follows to obtain a size-class-compact memory state:Compaction is performed
after deallocation on the size-class affected by the deallocation request.It implies move-
ment of only one memory object in the affected size-class.Therefore the compaction
operations are bounded.Since compaction is performed after the deallocation operation,
26
4 CF system
which could lead to memory holes,the memory is brought immediately into a size-class-
compact state again.
Figure 4.1:Arbitrary fragmented pages of a size-class
In contrast to our compaction strategy is the compaction strategy of Metronome [3].It
uses a time-triggered approach,where a function tells the garbage collector how many
pages have to get defragmented during each collection cycle.If the number of free pages
falls below a given threshold,then elements of the mostly empty pages are moved to the
mostly full pages of a size-class.The compaction workload is bounded but it varies
caused by the variety of all possible fragmentation states of the pages of a size-class.
Therefore the predictability of the duration of compaction cycles is strongly degraded.
CF performs compaction operations fairly distributed on deallocation operations and the
response times depend on the affected size-class.The compaction strategy of CF is
more deterministic than the compaction strategy of Metronome.
4.1.1 The compaction algorithm
Before presenting the algorithm we state two invariants and two rules that are related to
our compaction strategy.Recall that each size-class is a doubly-circularly-linked list.
Invariant 1 There exists at most one page in a size-class which is not full.
Invariant 2 If there exists one not-full page in the size-class,then this page has to be the
last element of its size-class list.
The compaction algorithm acts according to the following two rules.
Rule 1 If a memory object of a full page p in a size-class gets freed,and there exists no
not-full page,then p becomes the not-full page of its size-class and it is placed at the end
of the size-class list.
Rule 2 If a memory object of a full page p in a size-class gets freed,and there exists a
not-full page p
n
in the size-class,then one memory object of p
n
moves to p.If p
n
becomes
empty,then it is removed from the size-class.
27
4 CF system
Not every deallocation request requires moving of a memory object.The cases when no
moving is necessary are:
• There is only one page in the size-class where deallocation happened.No work is
needed in this case.
• The deallocated memory object is in the unique not-full page of the size-class.
Again,this case imposes no work except when the deallocated memory object is
the only memory object in the page.Then the page is removed fromthe size-class.
• There is no not-full page in the size-class where deallocation happened.In this
case we only need to do a fixed number of list-reference updates in order that the
affected page becomes the last page in the size-class list.
Note that when a memory object moves from one page to another,then we need to
perform a reference update in the abstract address space,in order that the abstract
address of a memory object points to its new location.
The compaction algorithm is presented in Listing 4.1.The algorithm is not called,if the
deallocation of a memory object leads to an empty page.An empty page is immedi-
ately removed from its size-class.The size-class remains in a size-class-compact state.
Details of the deallocation operation of CF are shown in Section 4.4.3 and 4.5.3.
1 void compacti on ( af f ect ed_page,si ze_cl ass ) {
2 i f ( af f ect ed_page!= l ast _page ) {
3 i f ( i s _ f u l l ( l ast _page ) ) {
4 set _l ast ( af f ect ed_page );
5 }
6 else {
7 move( obj ect,l ast _page,af f ect ed_page );
8 abst ract _address_space_updat e ( obj ect );
9 }
10 }
11 }
Listing 4.1:The CF compaction algorithm
In the example in Figure 4.2 and 4.3 Rule 1 of the CF compaction algorithm is applied.
Each page of the size-class in Figure 4.2 is completely full,until the green marked mem-
ory object of the first page gets freed.Since there is just one free page-block in the
size-class after deallocating this memory object,no compaction has to be performed.
The affected page of the deallocation operation becomes the not-full page of its size-
class and it is moved to the end of its size-class list.The invariants hold and the state of
the size-class-compact size-class is shown in Figure 4.3.
The example in Figure 4.4 and 4.5 shows Rule 2 of the compaction algorithm.There are
two completely full pages and a not-full page in the size-class.The green marked mem-
28
4 CF system
Figure 4.2:The green marked memory object becomes deallocated
Figure 4.3:The size-class after applying Rule 1
ory object of the first page becomes deallocated.Hence there are two pages after the
deallocation operation,which are not full.The invariants do not hold anymore.Therefore
a memory object of the last page of the size-class list is moved to the free page-block of
the first page,where the memory object was deallocated.After applying the compaction
algorithmthere is just one page in the size-class,which is not full and the invariants hold.
The size-class is in a size-class-compact state.
Figure 4.4:The green marked memory object becomes deallocated
29
4 CF system
Figure 4.5:The size-class after applying Rule 2
4.1.2 Complexity
The CF compaction algorithm creates size-class-compact size-classes.This is achieved
by adding additional complexity to the  call.Let a denote an abstract address of a
memory object and ζ(a) denote the size-class of the memory object.The complexity of
deallocating a memory objects,which is referenced by an abstract address a,consists
of the complexity of removing the memory object Θ(remove(ζ(a)),which depends on
the page-block size of the size-class.Additionally,if the size-class ζ(a) is not size-class
compact after a deallocation operation,a memory object of the last page (not-full page)
of the affected size-class ζ(a) has to be moved to the free page-block of the page,where
the memory objected was deallocated.The complexity of the compaction operation is
denoted by Θ(compact(ζ(a)) and depends on the page-block size of the size-class.Let
do_compaction(ζ(a)) = 1,if a size-class is not size-class compact after a deallocation
operation.It indicates that compaction must be performed.
The complexity of freeing a memory object of ζ(a) depends on ζ(a) and is expressed by
Θ(free(a)) = Θ(remove(ζ(a))) +Θ(compact(ζ(a)))do_compaction(ζ(a)) (4.1)
The exact asymptotic behaviour of the deallocation of a memory object can be deter-
mined,if the asymptotic behaviours of all its sub-functions are known.These sub-
functions are examined in the next sections.
4.2 The free-list concept
At various points in our algorithms,we need to detect a free entry in an interval of entries
in constant time.An entry can be an element of either abstract or concrete memory
space.For example,we need to detect a free page-block within a page,or a free abstract
address within the abstract address space,in constant time.
An idea is to organize the free entries in a linked LIFO list and keep a record of the head
30
4 CF system
of the list.Each free entry contains a reference to the next free entry in the list.Since
this reference is stored in free space (free entry),no memory is wasted.This approach
provides constant time for returning a free entry under the assumption that the list is
initialized.However,initialization takes linear time in the number of entries that we keep
track of,,since at initialization time all entries are free.Therefore,in
order to achieve constant time,we use a more advanced concept of a free-list.
A free-list is a data structure with two designated fields: and .The field 
records the (ordinal) number of the next never-used entry,and  is a head of a LIFO
list that will be the list of free entries after the initialization phase.Initially,  
and   .
If a never-used entry gets used,we increment the value of .If a used element gets
freed,then we add it to the list of free entries,at the head of the list.In order to get a free
entry:if  ≤  then we take the free entry at position ;otherwise
we take the entry of the list of free entries to which  points.
The free-list algorithm is shown in Listing 4.2.
1 st ruct f r ee_l i s t _ent r y ∗ get _ent r y ( f r e e _ l i s t ) {
2 i f ( f r ee_l i s t −>next > f r ee_l i s t −>number_ent ri es ) {
3 f r ee_ent r y = f r ee_l i s t −>head;
4 f r ee_l i s t −>head = f r ee_l i s t −>head−>next _ent r y;
5 }
6 else {
7 f r ee_ent r y = f r ee_l i s t −>head;
8 f r ee_l i s t −>head += f r ee_l i s t −>ent r y_si ze;
9 i f ( mode_switch ( ) ) {
10 f r ee_l i s t −>next = f r ee_l i st _mode;
11 }
12 }
13 return f r ee_ent r y;
14 }
Listing 4.2:The get general free list algorithm
A free entry is added to the free-list by a simple stack push operation,which is done in
constant time (4.2).
Θ(add_free_entry()) = Θ(1) (4.2)
Listing 4.2 shows that a free entry can be found in constant time using the free-list con-
cept.Let Θ(next()) denote the complexity of finding a free never-used entry at position
next and let Θ(head()) denote the complexity of returning the head entry of the list of
free entries.Furthermore let fl_mode() = 1,if the free-list algorithm returns the entry
31
4 CF system
referenced by the next pointer,otherwise it is 0.The complexity of the modes of the
get_free_entry() function is shown in (4.3).
Θ(get_free_entry)) =



Θ(next()) = Θ(1) if fl_mode() = 1,
Θ(head()) = Θ(1) if fl_mode() = 0.
(4.3)
Therefore the complexity of returning a free entry using the get_entry() function is con-
stant.
Θ(get_free_entry()) = Θ(1) (4.4)
4.3 Page management internals
The concrete address space is divided into pages of equal size.In our implementation
we use pages of size 16KB.The page size can be modified for specific applications.The
minimal page-block size S
1
in our implementation is 32B,which can also be changed.
Note that the minimal page-block size is 16B.The successive page-block sizes are de-
termined by the formula (3.1) taking f = 1/8.
In addition to the 16KB storage space,each page has a header with additional infor-
mation used for the memory management.The layout of a page with all administration
information is illustrated in Figure 4.6.
Figure 4.6:Page Layout
In the following we discuss in detail each field of the page header and how they are used
by CF to manage the memory.
32
4 CF system
4.3.1 Size-class list
The fields Next Page and Previous Page contain memory references to the next and
the previous page within the size-class.These two references build the size-class list,
which is a doubly-circularly-linked list.A singly-linked list is not sufficient for this purpose.
Consider the case that a page has to be removed or moved to another position in the
size-class list.The predecessor of the affected page is not known if a singly-linked list is
used.Finding the predecessor of the page would require in the worst case to go through
the whole size-class list until the desired page is found.Therefore a reference to the
successor and the predecessor page is needed for each page.Otherwise the size-class
list operations cannot be performed in constant time.Using a circularly-linked list has
the advantage that the last element of the list can be reached immediately from the list
head.The last page of a size-class has a special roll in the CF compaction algorithm,as
described above in section 4.1.1.It is the not-full page.
Both page header fields contain memory references and require 32 bits.Since all
free pages are in a global page free list and size-class operations are performed on
the circularly-doubly-linked size-class list,the size-class list functions remove_page(),
add_page(),get_head(),get_tail(),get_predecessor() and get_successor() are done in
constant time.
Θ(remove_page()) = Θ(add_page()) = Θ(1)
(4.5)
Θ(get_head()) = Θ(get_tail()) = Θ(1)
(4.6)
Θ(get_predecessor()) = Θ(get_successor()) = Θ(1) (4.7)
4.3.2 Size-class reference
The field Size-Class refers to the size-class instance of the page,which further refers to
the head of the size-class list.Hence,we can directly access the head and therefore also
the tail of the size-class list from any page,which is important for compaction.
To clarify the importance of this field,consider the following case:a memory object be-
comes deallocated.The abstract address of the memory object and the referenced con-
crete addresses (page-block) contain no information about the size-class of the memory
33
4 CF system
object,but this information is important to determine the size of the memory object.Oth-
erwise we do not know how much has to be freed.Furthermore,we need to find out if
compaction is necessary and has to be performed.Therefore each page has to know its
size-class.
The Size-Class field takes 32 bits,since it has to hold the memory reference to the size-
class of the page.The functions get_sizeclass() and get_size() take constant time.
Θ(get_sizeclass()) = Θ(get_size()) = Θ(1) (4.8)
4.3.3 Number of used page-blocks
The field#Used Page-Blocks contains the number of used page-blocks in the page.It is
used to determine whether a page is full or not.The function  represents
one integer comparison (#Used Page-Blocks < Page-Blocks of Page) and takes constant
time.If a new memory object is allocated,the#Used Page-Blocks field of the last page
of the size-class list is checked.A new page has to be used for the allocation request,if
this field shows that no page-blocks are free within the page.
Θ(free_entries()) = Θ(1) (4.9)
4.3.4 Free page-blocks
A page consists of page-blocks,some of which are in use and some are free.If free
page-blocks exist within a page,then they have to be found in constant time.For this
purpose each page has to maintain a free-list that keeps track of free page-blocks (cf.
Section 4.2).The fields Free-List Next and Free-List Head constitute the free-list.In our
configuration,the minimal page-block size is 32B.Hence,there are at most 512 page-
blocks in a page.Consequently,16 bits are enough for Free-List Next and for#Used
Page-Blocks.
Using the free-list concepts avoids the initialization of the list of free page-blocks.Preini-
tialization leads to unpredictable timing behaviour.Adding a new page to a size-class
has to be done fast and in constant time.A mutator that allocates a memory object of a
size-class,which consists of full pages,should not notice that a new page has to be used
to handle the allocation request..
4.3.5 Used page-blocks
Additionally to the free page-block list,a special data structure is needed that keeps track
of the used page-blocks within a page.For compaction purposes a used page-block of
34
4 CF system
the last page (not-full page) of the affected size-class has to be found in constant time
and it has to be moved to the page where the memory object was deallocated.Finding
a used page-block by traversing the last page of the affected size-class linearly leads to
unpredictable timing behaviour and has to be avoided.
The following two strategies are implemented and can be used to find a used page-block
of a page in constant time.
Used page-block list
The used page-blocks of a page can be held in a doubly-circularly-linked list.Using
a singly-linked list has the disadvantage that the predecessor of a list element is not
known and therefore page-blocks cannot be removed easily.The memory costs of the
list pointers are important for the used page-block list.This memory has to be carefully
used,because the references are located at the end of a page-block.This means that
the memory cannot be used by the memory object anymore.A reference to a memory
location requires 32 bits.As we use a doubly-linked list,every page-block requires 2
references.In total 64 bits are wasted for this purpose.For a page-block size of 32
bytes,64 bits are wasted for the successor and predecessor pointers:This leads to a
8/32 = 25% memory overhead for each page-block.Using 16 bit indices instead of
memory references reduces the memory overhead.The index of a page-block represents
the position of the page-block within the page,with respect to the linear order of the page-
block in the memory of the page (ordinal number).The memory overhead is reduced by
the half.
An example of the free-list of page-blocks and an used page-block list of a page is il-
lustrated in Figure 4.7 and 4.8.The integers in the headline represent the index (ordinal
number) of the page-blocks within the page (linear order).The free page-blocks are white
and they hold in the end of the memory slot the reference to the next free page-block.
This reference is a physical memory address.The example uses page-block indices for
simplicity.A used page-block (marked yellow) contains two indices,which represent the
doubly-linked list.In Figure 4.7,the free list is in next-page-block mode,therefore the
free-list pointer points to the next unused page-block.The used page-block list pointer
points to page-block 2,which respectively points to its predecessor 5 and to its succes-
sor 3.After a few mutator operations the page changes its free-list strategy and uses the
free-list head to find free page-blocks.This is shown in Figure 4.8 where the page-block
4 contains a free-list reference to its successor.
2-dimensional used page-block bitmap
Inspired by [22],where a 2-dimensional bitmap is used to administrate the whole memory,
as described in Section 2.3.4,we use a two dimensional bitmap to administrate the used
35
4 CF system
Figure 4.7:Used page-block list and free page-block list (next-page-block mode)
Figure 4.8:Used page-block list and free page-block list (free-list mode)
page-blocks of a page.Since the minimal page-block size is 32B,there are at most 512
page-blocks in a page.Hence,we use a bitmap of size 16 × 32 to record the status.
In addition,we need 16 more bits to record,for each row of the bitmap,if at least one
bit is set.This additional bitstring is essential for constant time access to a used page-
block within the page.Namely,there are CPU instructions that find a set bit in a bitstring
in constant time.These instructions are limited
1
to bitstrings of length 32,which is the
reason why we use such a two dimensional bitmap.So,in order to get a used page-
block we first search a set bit in the additional bitstring,and then get a set bit in the
corresponding row of the bitmap.Note that if such CPU instructions do not exists,these
functions can be implemented in C with a resulting logarithmic complexity.
A two dimensional bitmap example is shown in Figure 4.9.The first dimension of this
bitmap indicates that two bitstrings,the sixth and the fifteenth bitstring of the second
dimension,contain no used page-block.A used page-block within a page is found as
follows:A CPU instruction is applied on a bitstring,which returns the position of the
least significant bit in the bitstring.Listing 4.3 and 4.4 show the assembler code of the
function ,which return the least significant bit of a bitsring x in constant time.Applying
this function on the bitsring of the first dimension of example 4.9 returns 0.This value
1
On a 32-bit CPU.
36
4 CF system
indicates that the first bitstring contains at least one set bit,i.e.,at least one of the first
32 page-blocks of the page are in use.Therefore the  function is applied on the first
bitstring of the second dimension which finds a used-page-block.The value 0 is returned
since the first page-block is in use.The value -1 is returned if no bit is set in the bitstring
(no page-block of the page is in use).
1 st at i c i nl i ne i nt f l s ( i nt x ) {
2 i nt r;
3 __asm__("bsr l
%1,%0\n\t"
4"j nz
1f\n\t"
5"movl
$−1,%0\n"
6"1:":"=r"( r ):"g"( x ) );
7 return r;
8 }
Listing 4.3:Returns the least significant bit of a bitstring x (IA-32 code)
1 st at i c i nl i ne i nt f l s ( i nt x ) {
2 i nt r;
3 __asm__("cl z\t %0,%1":"=r"( r ):"r"( x ):"cc");
4 return 31−r;
5 }
Listing 4.4:Returns the least significant bit of a bitstring x (ARM code)
Figure 4.9:Two dimensional bitmap (16 ×32)
37
4 CF system
Complexity
Independent of using a used page-block list or a 2-dimensional used page-block bitmap
(which requires processor support),we have constant time complexity for adding,getting,
and removing of an element.This is shown in Equation 4.10 and 4.11.
Θ(add_pb_usedlist()) = Θ(get_pb_usedlist()) = Θ(remove_pb_usedlist()) = Θ(1)
(4.10)
Θ(add_pb_bitmap()) = Θ(get_pb_bitmap()) = Θ(remove_pb_bitmap()) = Θ(1) (4.11)
4.3.6 Memory Overhead
The data structure,which is used in the page header to administrate the used page-block
determines the size of the page header.If a used page-block list is used to administrate
the memory of a page,the page header takes 24B.The memory overhead is less than
0.15%.If the used page-block bitmap is used,the page header takes 88B.Due to this,
the memory overhead is less than 0.6%.Both implementations introduce an insignificant
memory overhead for the page header.
4.4 Moving Implementation
In the moving implementation of CF,memory objects are moved within the memory,dur-
ing compaction.We present the allocation,deallocation,and dereferencing algorithms of
this implementation and the complexity.
4.4.1 Concept
The abstract address space is implemented as a contiguous piece of memory.The free
entries of the abstract address space are organized in a free-list.
The concrete address space is organized as described in Section 4.3.Hence,each page
is implemented as a contiguous piece of memory as well.Moreover,each page-block
contains an explicit reference to its abstract address in the abstract address space.This
is illustrated in Figure 4.10.This reference is located at the end of the page-block.It
takes 12.5% of the memory of the page-block of the smallest size-class,which is the
worst-case.
38
4 CF system
Figure 4.10:Explicit reference of a page-block to an abstract address
1 void ∗∗cfm_mal l oc ( si ze ) {
2 page = get _page_of _si ze_cl ass ( si ze );
3 page_bl ock = get _f ree_page_bl ock ( page );
4 set_used ( page,page_bl ock );
5 return cr eat e_abst r act _addr ess ( page_bl ock );
6 }
Listing 4.5:Allocation - moving implementation
4.4.2 Allocation
The algorithm for allocation  is presented in Listing 4.5.The method
 returns a reference to a page of the corresponding size-class
in constant time:if all pages in the size-class are full,then with the help of the free-list of
free pages,we get a new page;otherwise the not-full page of the size-class is returned.
Consequently,this method executes in constant time.The method 
takes constant time as well,using the free-list of free page-blocks of a page.Declaring a
page-block used is a bit-set operation.As mentioned above,the free abstract addresses
are also organized in a free-list,so the method  takes constant
time.It gets a free abstract address and creates the indirection pointer to the page-block
of the memory object in concrete address space.As a result, takes constant
time,i.e.,Θ( ) = Θ(1).
4.4.3 Deallocation
The deallocation algorithm  is shown in Listing 4.6.The method
 takes constant time,since it only accesses the memory location to
which the abstract address refers.The method  executes a fixed amount of
39
4 CF system
1 void cf m_f ree ( abs_address ) {
2 page_bl ock = get_page_bl ock ( abs_address );
3 page = get_page ( page_bl ock );
4 si ze_cl ass = get _si ze_cl ass ( page );
5 set_unused ( page,page_bl ock );
6 add_free_page_bl ock ( page,page_bl ock );
7 add_f ree_abst ract _address ( abs_address );
8 i f ( page == empty ) {
9 remove_page( si ze_cl ass,page );
10 }
11 else {
12 compacti on ( si ze_cl ass,page );
13 }
14 }
Listing 4.6:Deallocation - moving implementation
arithmetic operations,and therefore also takes constant time.Namely,pages are lin-
early aligned in memory,so for a given page-block,we can calculate the beginning ad-
dress of its page.The method  is executed in constant time,since every
page contains a field Size-Class.The method  changes the value of a sin-
gle bit in the bitmap,so it also requires constant time.Both  and
 amount to adding a new element to a corresponding free-