Presentation - Jordan University of Science and Technology

clippersdogheartedSoftware and s/w Development

Dec 14, 2013 (3 years and 7 months ago)

70 views

Memory Management


Jordan University of Science & Technology


CPE 746 Embedded Real
-
Time Systems


Prepared By:
Salam Al
-
Mandil & Hala Obaidat


Supervised By:
Dr. Lo’ai Tawalbeh

Outline


Introduction


Common Memory Types


Composing Memory


Memory Hierarchy


Caches


Application Memory Management


Static memory management


Dynamic memory management


Memory Allocation


The problem of fragmentation


Memory Protection


Recycling techniques







Introduction



An embedded system is a special
-
purpose
computer system designed to perform one or
a few dedicated functions, sometimes with
real
-
time computing constraints.


An embedded system is part of a larger
system.


Embedded systems often have small memory
and are required to run a long time, so
memory management is a major concern
when developing real
-
time applications.



Common Memory Types


RAM


DRAM:


Volatile memory.


Address lines are multiplexed. The 1
st

half is
sent 1
st

& called the RAS. The 2
nd

half is sent
later & called CAS.


A capacitor and a single transistor / bit =>
better capacity.


Requires periodical refreshing every 10
-
100
ms => dynamic.


Cheaper / bit => lower cost.


Slower => used for main memory.




Reading DRAM Super cell (2,1)


Step 1(a): Row access strobe (
RAS
)
selects row 2.

cols

rows

RAS = 2

0

1

2

3

0

1

2

internal row buffer

16 x 8 DRAM chip

3

addr

data

2

/

8

/

memory

controller


Step 1(b): Row 2 copied from DRAM array to
row buffer.

Reading DRAM Super cell (2,1)


Step 2(a): Column access strobe (
CAS
)
selects column 1.

internal buffer

cols

rows

0

1

2

3

0

1

2

3

internal row buffer

16 x 8 DRAM chip

CAS = 1

addr

data

2

/

8

/

memory

controller


Step 2(b): Super cell (2,1) copied from buffer
to data lines, and eventually back to the
CPU.

super cell

(2,1)

super cell

(2,1)

To CPU

RAM


SRAM:


Volatile memory.


Six transistors / bit => lower capacity.


No refreshing required => faster & lower
power consumption.


More expensive / bit => higher cost.


Faster => used in caches.


Some Memory Types


ROM:


Non
-
volatile memory.


Can be read from but not written to, by a
processor in an embedded system.


Traditionally written to, “programmed”,
before inserting to embedded system.


Stores constant data needed by system.


Horizontal lines = words, vertical lines =
data.


Some embedded systems work without
RAM, exclusively on ROM, because their
programs and data are rarely changed.







Some Memory Types


Flash Memory:


Non
-
volatile memory.


Can be electrically erased & reprogrammed.


Used in memory cards, and USB flash drives.


It is erased and programmed in large blocks
at once, rather than one word at a time.



Examples of applications include PDAs and
laptop computers, digital audio players,
digital cameras and mobile phones.

Type

Volatile?

Writeable?

Erase Size

Max Erase
Cycles

Cost (per
Byte)

Speed

SRAM

Yes

Yes

Byte

Unlimited

Expensive

Fast

DRAM

Yes

Yes

Byte

Unlimited

Moderate

Moderate

Masked
ROM

No

No

n/a

n/a

Inexpensive

Fast

PROM

No

Once, with a
device
program
mer

n/a

n/a

Moderate

Fast

EPROM

No

Yes, with a
device
program
mer

Entire Chip

Limited
(consult
datasheet
)

Moderate

Fast

EEPROM

No

Yes

Byte

Limited
(consult
datasheet
)

Expensive

Fast to read,
slow to
erase/wri
te

Flash

No

Yes

Sector

Limited
(consult
datasheet
)

Moderate

Fast to read,
slow to
erase/wri
te

NVRAM

No

Yes

Byte

Unlimited

Expensive
(SRAM +
battery)

Fast

Composing Memory


When available memory is larger, simply
ignore unneeded high
-
order address bits and
higher data lines.


When available memory is smaller, compose
several smaller memories into one larger
memory:


Connect side
-
by
-
side.


Connect top to bottom.


Combine techniques.



Connect side
-
by
-
side


To increase width of words.




2
m

×

3n ROM

2
m

×

n ROM

A
0



enable

2
m

×

n ROM



2
m

×

n ROM



Q
3n
-
1

Q
2n
-
1



Q
0



A
m

Increase width of words

Connect top to bottom


To increase number of words.


2
m+1

×

n ROM

2
m

×

n ROM

A
0



enable



2
m

×

n ROM

A
m
-
1

A
m

1
×

2 decoder







Q
n
-
1

Q
0



Increase number of words

Combine techniques



To increase number and width of words.



A

enable

outputs

Increase number and width of words

Memory Hierarchy


Is an approach for organizing memory and
storage systems.


A memory hierarchy is organized into several
levels


each smaller, faster, & more
expensive / byte than the next lower level.


For each k, the faster, smaller device at level
k serves as a cache for the larger, slower
device at level k+1.


Programs tend to access the data at level k
more often than they access the data at level
k+1.




An Example Memory Hierarchy

registers

on
-
chip L1

cache (SRAM)

main memory

(DRAM)

local secondary storage

(local disks)

Larger,

slower,

and

cheaper

(per byte)

storage

devices

remote secondary storage

(distributed file systems, Web servers)

Local disks hold files
retrieved from disks on
remote network servers.

Main memory holds disk

blocks retrieved from local

disks.

off
-
chip L2

cache (SRAM)

L1 cache holds cache lines retrieved
from the L2 cache memory.

CPU registers hold words retrieved
from L1 cache.

L2 cache holds cache lines
retrieved from main memory.

L0:

L1:

L2:

L3:

L4:

L5:

Smaller,

faster,

and

costlier

(per byte)

storage

devices

Caches


Cache:

The first level(s) of the memory
hierarchy encountered once the address
leaves the CPU.


The term is generally used whenever
buffering is employed to reuse commonly
occurring items such as webpage caches, file
caches, & name caches.


Caching in a Memory Hierarchy

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Larger, slower, cheaper storage

device at level k+1 is partitioned

into blocks.

Data is copied between

levels in block
-
sized transfer
units

8

9

14

3

Smaller, faster, more expensive

device at level k caches a

subset of the blocks from level k+1

Level k:

Level k+1:

4

4

4

10

10

10


Request

14

Request

12

General Caching Concepts


Program needs object d, which is
stored in some block b.


Cache hit


Program finds b in the cache
at level k. E.g., block 14.


Cache miss


b is not at level k, so level k
cache must fetch it from level
k+1. E.g., block 12.


If level k cache is full, then
some current block must be
replaced (evicted). Which one
is the “victim”? We’ll see later.

9

3

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Level


k:

Level

k+1:

14

14

12

14

4*

4*

12

12

0

1

2

3

Request

12

4*

4*

12

Cache Placement



There are 3 categories of cache
organization:

1.
Direct
-
mapped.

2.
Fully
-
associative.

3.
Set
-
associative.

Direct
-
Mapped


The block can appear in 1 place only.


Fastest & simplest organization but highest
miss rate due to contention.


Mapping is usually: Block address % Number
of blocks in cache.





Data

Valid

Tag

Index

Offset

=

V T D

Fully
-
Associative


The block can appear anywhere in the cache.
Slowest organization but lowest miss rate.


Tag

Offset

=

V T D

Valid

V T D



V T D

=

=

Data

Set
-
Associative


The block can appear anywhere within a
single set. (n
-
way set associative)


The set number is usually: Block address %
Number of sets in the cache.


Tag

Index

Offset

=

V T D

Data

Valid

V T D

=

Cache Replacement


In a direct
-
mapped cache, only 1 block is
checked for a hit, & only that block can be
replaced.


For set
-
associative or fully
-
associative caches,
the evicted block is chosen using three
strategies:


Random.


LRU.


FIFO.

Cache Replacement



As the associativity increases => LRU harder
& more expensive to implement => LRU is
approximated.


LRU & random perform almost equally for
larger caches. But LRU outperforms others for
small caches.


Write Policies

1.
Write Back:

the information is only written
to the block in the cache.


1.
Write Through:
the information is written to
both the block in the cache & to the block in
lower levels.

Reducing the Miss Rate



1.
Larger Block Sizes & Caches.

2.
Higher Associativity.



Application Memory Management



Allocation: to allocate portions of memory to
programs at their request.



Recycling: freeing it for reuse when no longer
needed.

Memory Management



In many embedded systems, the kernel and
application programs execute in the same
space i.e., there is no memory protection.


The embedded operating systems thus make
large effort to reduce its memory occupation
size.


Memory Management


An RTOS uses small memory size by
including only the necessary
functionality for an application.


We have two kinds of memory
management:



Static



Dynamic


Static memory management



provides tasks with temporary data
space.


The system’s free memory is divided
into a pool of fixed sized memory
blocks.


When a task finishes using a memory
block it must return it to the pool.





Another way is to provide temporary
space for tasks is via priorities:


A high priority pool : is sized to have
the worst
-
case memory demand of the
system


A low priority pool : is given the
remaining free memory.



Static memory management


Dynamic memory management



employs memory swapping, overlays,
multiprogramming with a fixed number
of tasks (MFT), multiprogramming with
a variable number of tasks (MVT) and
demand paging.


Overlays allow programs larger than the
available memory to be executed by
partitioning the code and swapping
them from disk to memory.






MFT: a fixed number of equalized code
parts are in memory at the same time.


MVT: is like MFT except that the size of
the partition depends on the needs of
the program.


Demand paging : have fixed
-
size pages
that reside in non
-
contiguous memory,
unlike those in MFT and MVT




Dynamic memory management


Memory Allocation


is the process of assigning blocks of
memory on request .


Memory for user processes is divided
into multiple partitions of varying sizes.


Hole : is a block of available memory.




Static memory allocation


means that all memory is allocated to each
process or thread when the system starts up.
In this case, you never have to ask for
memory while a process is being executed.
This is very costly.


The advantage of this in embedded systems
is that the whole issue of memory
-
related
bugs
-
due to leaks, failures, and dangling
pointers
-
simply does not exist .



Dynamic Storage
-
Allocation


How to satisfy a request of size
n
from
a list of free holes. This means that
during runtime, a process is asking the
system for a memory block of a certain
size to hold a certain data structure.


Some RTOSs support a timeout function
on a memory request. You ask the OS
for memory within a prescribed time
limit.










Dynamic Storage
-
Allocation Schemes


First
-
fit
:

Allocate the
first hole

that is
big enough, so it is fast


Best
-
fit
:

Allocate the
smallest hole

that
is big enough; must search entire list,
unless ordered by size.


Buddy:

it divides memory into
partitions to try to satisfy a memory
request as suitably as possible.



Buddy memory allocation


allocates memory in powers of 2


it only allocates blocks of certain sizes


has many free lists, one for each
permitted size


How buddy works?


If memory is to be allocated

1
-
Look for a memory slot of a suitable size (the
minimal 2k block that is larger then the
requested memory)


If it is found, it is allocated to the program


If not, it tries to make a suitable memory slot. The
system does so by trying the following:


Split a free memory slot larger than the requested
memory size into half


If the lower limit is reached, then allocate that amount of
memory


Go back to step 1 (look for a memory slot of a suitable
size)


Repeat this process until a suitable memory slot is found


How buddy works?


If memory is to be freed

1.

Free the block of memory

2.

Look at the neighboring block
-

is it
free too?

3.
If it is, combine the two, and go back
to step 2 and repeat this process until
either the upper limit is reached (all
memory is freed), or until a non
-
free
neighbor block is encountered


64K

64K

64K

64K

64K

64K

64K

64K

64K

64K

64K

64K

64K

64K

64K

64
K

t=0

1024K

t=1

A
-
64K

64K

128K

256K

512K

t=2

A
-
64K

64K

B
-
128K

256K

512K

t=3

A
-
64K

C
-
64K

B
-
128K

256K

512K

t=4

A
-
64K

C
-
64K

B
-
128K

D
-
128K

128K

512K

t=5

A
-
64K

64K

B
-
128K

D
-
128K

128K

512K

t=6

128K

B
-
128K

D
-
128K

128K

512K

t=7

256K

D
-
128K

128K

512K

t=8

1024K

Example: buddy system

The problem of fragmentation



neither first fit nor best fit is clearly
better that the other in terms of storage
utilization, but first fit is generally
faster.


All the previous schemes has external
fragmentation.


the buddy memory system has little
external fragmentation.















External Fragmentation

total
memory space exists to satisfy a
request, but it is not contiguous.


Internal Fragmentation

allocated
memory may be slightly larger than
requested memory; this size difference
is memory internal to a partition, but
not being used.



Fragmentation

Example: Internal Fragmentation

Memory Protection



it may not be acceptable for a hardware
failure to corrupt data in memory. So, use of
a hardware protection mechanism is
recommended.


This hardware protection mechanism can be
found in the processor or MMU.


MMUs also enable address translation, which
is not needed in RT because we use cross
-
compilers that generate PIC code (Position
Independent Code).



Hardware Memory Protection


Recycling techniques


There are many ways for automatic
memory managers to determine what
memory is no longer required


garbage collection relies on determining
which blocks are not pointed to by any
program variables
.

Recycling techniques


Tracing collectors

:

Automatic
memory managers that follow pointers
to determine which blocks of memory
are
reachable

from program variables.


Reference counts :
is a count of how
many
references

(that is, pointers)
there are to a particular memory block
from other blocks .



Example : Tracing collectors


Mark
-
sweep collection:



Phase1:
all blocks that can be reached by
the program are marked.


Phase2:
the collector sweeps all allocated
memory, searching for blocks that have not
been marked. If it finds any, it returns them
to the allocator for reuse.


Mark
-
sweep collection


The drawbacks :


It must scan the entire
memory in use before
any memory can be
freed.


It must run to completion
or, if interrupted, start
again from scratch.



Example : Reference counts


Simple reference counting :


a reference count is kept for each
object
.


The count is incremented for each new reference,
decremented if a reference is overwritten, or if the
referring object is recycled.


If a reference count falls to zero, then the object is
no longer required and can be recycled.


it is hard to implement efficiently because of the cost
of updating the counts.



References:



http://www.memorymanagement.org/articles
/recycle.html


http://www.dedicated
-
systems.com


http://www.Wikipedia.org


http://www.cs.utexas.edu


http://www.netrino.com


S. Baskiyar,Ph.D. and N.Meghanathan,A
Survey of Contemporary Real
-
time Operating
Systems.