Page Access Tracking to Improve Memory Management

feastcanadianSoftware and s/w Development

Dec 14, 2013 (3 years and 9 months ago)

105 views

1

PATH:
P
age
A
ccess
T
racking
H
ardware to Improve Memory
Management

Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and

Angela Demke Brown




University of Toronto,

Canada

2

Page Access Tracking Challenge


Storage Management Research


Many sophisticated algorithms


Most require accurate knowledge about memory access trace


Adopted mostly for file systems or databases


Not straightforward for virtual memory



Problem: Limited Page Access Tracking


Hard to measure either
Reuse Distance

or
Temporal
Locality



Conventional Access Tracking Mechanisms


Monitoring page faults


Most page accesses are missed.



Scanning Page Table bits


High scanning overhead => low scanning frequency


3

Page Access Tracking Challenge
(cont’d)


Access Tracking with Performance Counters


Statistical Data Sampling:


Favours only hot pages


Hard to track reuse distance or temporal locality



Recording TLB misses

-

High overhead


TLB’s are small (TLB miss is very frequent)


TLB miss handling is performance
-
critical



Hardware Approach [Zhou
et al.

ASPLOS’04]

+

Effective for its purpose (but inflexible)

-

Impractical hardware resource requirements


~ 1 MB of hardware buffer per 1GB of physical memory!



Software Approach [Yang
et al.
OSDI’06]


Dividing pages into
active

and
inactive

sets


Page
-
protecting members of the
inactive

set

-

Overhead can still be too high



4

Page Access Tracking in Software













Performance of adaptive page replacement for FFT

vs.

Runtime overhead of page access tracking in software


10% overhead

even with a

large active set
and poor
performance

90% overhead to
get acceptable
performance

5

Page Access Tracking Hardware
(PATH)







Advantages


Extra hardware resources required are small (around 10KB)


Off the common path


Scalable (does not grow with physical memory)


TLB

CPU Core

VADDR

LOOKUP

MISS


Page Tables

VADDR

Page

Access

Buffer

Overflow

Interrupt

MISS

Page Access Log

6

Information Provided by PATH







Raw Form


Abstraction: Precise LRU Stack


Abstraction: Miss Rate Curve (MRC)

TLB

CPU Core

VADDR

LOOKUP

MISS


Page Tables

VADDR

Page

Access

Buffer

Overflow

Interrupt

MISS

Page Access Log

7

Basic Abstraction: LRU Stack






Accessed and updated for each entry on the
Page Access
Log



Implementation:


Lookup:


Page Table
-
like Structure


O(1) lookup time


Update


Doubly linked list


A few pointers are updated for each page access

8

Basic Abstraction: Miss Rate Curve (MRC)


Basic Info:


The number of misses for a given memory size in period of
time.



Basic Use:


Estimating the “memory needs” of an application.

9

Computing MRC Online


Mattson’s Stack Algorithm










For LRU:


Memory Sizes < LRU Distance: miss


Memory Sizes >= LRU Distance: hit


LRU Distance

MRU Distance

Page Access

10

Runtime Overhead Tradeoff







The larger the
Page Access Buffer
(active set)


The more page accesses are filtered

+
The less run
-
time overhead

-

The less accurate page access trace

TLB

CPU Core

VADDR

LOOKUP

MISS


Page Tables

VADDR

Page

Access

Buffer

Overflow

Interrupt

MISS

Page Access Log

11

Runtime Overhead, Example: FFT

Active Set

Entries

12

Runtime Overhead, Example: LU
-
non.

Active Set

Entries

13

Runtime Overhead


Summary


Overall, a 2K Entry Page Access Buffer seems to be the
best point in the tradeoff between performance and
runtime overhead.



PATH’s overhead is less than 6% across a wide variety
of applications.



PATH’s overhead is negligible in most cases.


14

Case 1: Adaptive Page Replacement



Region
-
based Page Replacement


Use different replacement policies for different regions in
the virtual address space


Rationale: each region is likely to contain a data
structure with a fairly stable access pattern



Low Inter
-
Reference Set (LIRS)


Handles sequential and looping patterns


Requires tracking page accesses


Originally developed for file system caching


Easily enabled by the PATH
-
generated information



15

Region
-
based Replacement


Using MRC for comparison:


16

Region
-
based Replacement
(cont’d)



Dividing Memory among Regions


Minimize total miss rate by giving memory to the regions that
have more “benefit
-
per
-
page”.










17

Simulation Results


LU
-
contiguous (SPLASH2)


18

Simulation Results

BT (NAS Benchmark)


19

Case 2: Prefetching


Spatial Locality
-
based


Prefetch pages spatially
-
adjacent to the faulted page.


Advantages


Simple and easy to implement


Effective for many cases


Major drawback


Oblivious to non
-
spatial access patterns



Temporal Locality
-
based


Prefetch pages that are regularly accessed together.


Use PATH to track temporal locality of pages.


20

Temporal Locality
-
based Prefetching


Page Proximity Graph (PPG)


Each page is a node


There exists an edge from
p

to
q

if
q

is regularly accessed shortly after
p
(temporal locality)



PPG Update:


Add a page
q

to
p
’s proximity set if
q

appears in the LRU stack in close
proximity to
p

repeatedly .



Basic prefetching scheme:



Breadth
-
First traversal starting from the faulted page.

21

Prefetching


LU non
-
contiguous (SPLASH2)












22

Conclusions


Page Access Tracking Hardware


Small (10KBytes in size)


Low
-
overhead


Generic



Cases Studied


Adaptive Page Replacement


Process Memory Allocation (See Paper)


Prefetching



Significant performance improvement can be
achieved by tracking page accesses.


23

Future Directions


Other case studies


NUMA page placement


Super
-
page management



Per
-
thread page access tracking


Augmenting page accesses with thread info



Multiprocessor issues


Combining traces collected on multiple CPUs


24

Questions