An Energy-Efficient Adaptive Hybrid Cache

wistfultitleΗλεκτρονική - Συσκευές

24 Νοε 2013 (πριν από 3 χρόνια και 10 μήνες)

60 εμφανίσεις

2013/10/21 Yun
-
Chung Yang

An Energy
-
Efficient Adaptive Hybrid Cache

Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman, Yi
Zou

Computer Science Department, University of California, Los
Angeles

Low Power Electronics and Design (ISLPED) 2011 International Symposium
on

Page 67


72


Abstract


Related Work


What’s the Problem


Run
-
time behavior


Set balancing


Proposed Method


Adaptive Hybrid Cache


Experiment Result


Conclusion

2


By

reconfiguring

part

of

the

cache

as

software
-
managed

scratchpad

memory

(SPM)
,

hybrid

caches

manage

to

handle

both

unknown

and

predictable

memory

access

patterns
.

However,

existing

hybrid

caches

provide

a

flexible

partitioning

of

cache

and

SPM

without

considering

adaptation

to

the

run
-
time

cache

behavior
.

Previous

cache

set

balancing

techniques

are

either

energy
-
inefficient

or

require

serial

tag

and

data

array

access
.

In

this

paper

an

adaptive

hybrid

cache

is

proposed

to

dynamically

remap

SPM

blocks

from

high
-
demand

cache

sets

to

low
-
demand

cache

sets
.

This

achieves

19
%
,

25
%
,

18
%

and

18
%

energy
-
runtime
-
production

reductions

over

four

previous

representative

techniques

on

a

wide

range

of

benchmarks
.


3

4


Software
Controlled

Partition cache
from way to
blocks

[2], [3]

Energy(set utilization)

SPM

Serial
tag/data
access

[4], [
5
]

[
8
]
-
[10]

This paper

Victim
cache[11]

Balanced
Cache[12]

Column caching

FlexCache

Reconfigurable cache

Virtual Local Store

Not need
tag/data
access

Use CAM
memory


P
revious hybrid cache designs partition the cache
and SPM
without adaption to the run time cache
behavior
.


Due to SPM allocation is uniform and cache behavior is non
-
uniform


Hot cache set problem.



5


Adaptive Hybrid Cache


(a) Original Code


(b) Transformed Code for

AH
-
cache


Compiler’s job


(c) Memory space for AH

-
cache


(d) SPM blocks


Adaptive Mapping to cache


(e) SPM mapping in

cache


(f) SPM mapping look
-
up

table(SMLT)


6


Hardware for AH
-
cache


The
green part
is for accessing the SPM.


Perform
addressing cache

and
SMLT look
-
up

in parallel
with the virtual address calculation

in the pipeline
architecture.


7


Dynamically remap SPM blocks from high
-
demand cache sets to low
-
demand cache sets


Migrate SPM blocks from high demand sets to low
demand sets


Initial map of SPM block
in cache is random

8


Goal : Application requires
P

SPM blocks while AH
-
cache
can provide
Q

SPM blocks at most, there will be
S=P
-
Q

blocks to adaptive satisfy the high
-
demand cache sets.


Solution :


Use
victim tag buffer
to capture the demand of each set.


Floating block holder


Record the cache sets that hold the floating
blocks.


9


Re
-
insertion bit = 1, means this set is highly
demanded and re
-
inserted into FBH queue.

10


Problem : Worst
-
case of
S

cycles delay for
searching, where
S

is max size of SPM.


Solution :


Storing re
-
insertion bit into table called
re
-
insertion bit
table(RIBT)


Search parallel in 16 re
-
insertion bit.


11


Storage Overhead


Critical path of SMLT table in pipeline stage


Comparison with other(performance, miss rate,
energy)


Non
-
adaptive hybrid cache(N)


Non
-
adaptive hybrid cache + balanced cache(B)


Non
-
adaptive hybrid cache + victim cache(
Vp
,
Vs
)


Phase
-
reconfigurable hybrid cache(R)


Adaptive hybrid cache(AH)


Static optimized hybrid cache(S)


12

13


16KB, 2 way
-
associative, 128 sets, 64B data
block, 4B tag entry size.


128 64B SPM blocks


SMLT 128 9
-
bit entries(1 valid + 6 bit index + 2 bit way)


Insertion flag + 4
-
bit counter


FBH queue 128 7
-
bit entries


RIBT 8 16
-
bit entries


Total : 0.4KB, 3% of the hybrid cache size


14


32nm technology(cache block size is 64B)


0.2ns for critical path fits in 4GHz core.


15


R

reduces cache miss by 34%


AH
-
cache reduces the cache miss by 52%


AH
-
cache outer perform
B

because of
B

cache allocate
SPM in uniform way without considering cache set
demand.


Victim cache depends on its size.


AH
-
cache outer perform
B
,
Vp
,
Vs

and
R

by 3%,
4%, 8% and 12%, respectively

16


Although the proposed method with additional
hardware, SMLT, VTB and adaptive mapping unit,

AH
-
cache still have energy reduction of 16%,
22%, 10% and 7% compared to designs
B
,
Vp
,
Vs

and
R
, respectively.

17


AH
-
cache dynamic remapping SPM blocks to
cache block on run
-
time behavior.


AH
-
cache achieves energy
-
runtime
-
production
reduction of 19%, 25%, 18% and 18% over
designs
B
,
Vp
,
Vs

and
R.



My comment


Detail explained


Mention the usage of tag while in SPM mode

18