Speculative Parallelization for the Polyflow Architecture

crashclappergapΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 3 χρόνια και 7 μήνες)

71 εμφανίσεις

PaCo

Probability
-
Based
Pa
th
Co
nfidence Prediction

Kshitiz Malik, Mayank Agarwal, Vikram Dhar,

Matthew Frank



Implicitly Parallel Architectures Group

University of Illinois at Urbana
-
Champaign

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Summary


Path Confidence: likelihood of correct path


Pipeline Gating, SMT Fetch



Conventional: use count of low
-
conf branches


Inaccurate



PaCo: Directly estimates
goodpath

probability


Highly accurate, modest hardware


Improves performance on gating, SMT Fetch








HPCA
-
14

February 18, 2008

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Outline


Overview


Motivation


Design


Evaluation


Applications

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Branch Confidence Prediction


Branch Confidence: Single Branches


Low Conf / High Conf


Applications:
Checkpointing
, Multipath etc



Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Path Confidence Prediction


Path Confidence: likelihood of being on
correct path


Contributions from Multiple Branches


Conventionally: Count of (unexecuted) low
-
confidence branches.

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Path Confidence: Applications


Path Confidence: Multiple Branches


Count of low
-
confidence branches.


Applications: Pipeline Gating,
SMT Prioritization


Gated





Gate
-
Count
= 5

(Fetch gated when conf >= 5)

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Path Confidence: Applications


Path Confidence: Multiple Branches


Count of low
-
confidence branches.


Applications:
Pipeline Gating
, SMT Prioritization


Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Issues with Conventional Approach


Path Confidence: Multiple Branches


Count of low
-
confidence branches.


Applications: Pipeline Gating, SMT Prioritization



Problem: implicit assumption that


High
-
conf branches never
mispredict


All low
-
conf branches have same
misprediction

probability



Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Path Confidence using PaCo


Directly estimates
goodpath

probability


Highly Accurate: RMS error 3.8%


Modest Hardware: 60 bytes of counters


Excellent Performance in applications



Gating:
Badpath

Instrs
. Performance
Conv



7%



0.1%


PaCo


32%



0.01%



SMT Fetch Prioritization:

perf

upto

23% (5.5% av.)




Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Outline


Overview


Motivation


Design


Evaluation


Applications

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Branch Confidence Prediction


Classify branches as low conf. or high conf.


JRS predictor:


Count consecutive correct predictions


Below threshold: Low Confidence


Table of Miss
-
Distance Counters (MDCs)

+

Branch

PC

Global

Hist

5

4 bits

MDC Table

Mispredict
?

+

1

0

On
Branch
Execution

MDC

Value

MDC Value

<

Threshold?

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Conventional Path Confidence Prediction


Count of low
-
confidence branches = measure
of path confidence


Threshold
-
and
-
Count Approach





Inaccurate


Coarseness


No relation to
goodpath

probability

Branch

MDC

Table

Miss Distance
Counter Value

(4 bits)

Threshold
Function

Low Conf /
High Conf

(1 bit)

Path
Confidence

Sum

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Coarseness


All low
-
conf
branches are equal


Eg
: SMT prior.


gcc
:


a

pending


0.57
gpath

prob


vortex:


2
b

pend
.


Prob

= 0.88*0.88 = .78


yet, fetch from
gcc
!



Misprediction Rate (pct)

MDC Value

twolf
vortex
gcc
gzip
Miss Distance
Counter Value

(4 bits)

Threshold
Function

Low Conf /
High Conf

(1 bit)

a

b

JRS Threshold =
3

Low Conf

High Conf

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February
18
,
2008

Coarseness

Misprediction Rate (pct)

MDC Value

twolf
vortex
gcc
gzip
Low Conf

High Conf

JRS Threshold =
3


All low
-
conf
branches are equal



High
-
conf branches
don’t
mispredict


twolf
, vortex,
mdc
=3


Don’t affect conf.

Miss Distance
Counter Value

(
4
bits)

Threshold
Function

Low Conf /
High Conf

(1 bit)

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Sum ≠ Probability

Goodpath

Goodpath

Likelihood when 5 low
-
confidence branches are pending

Low Conf /
High Conf

(1 bit)

Path
Confidence

Sum


Gating at count=
5


Too aggressive for
gzip
, not useful for route


SMT Fetch:


gzip

and route, conf =
5
. Equal
bandwith
?

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February
18
,
2008

Sum ≠ Probability

Goodpath Prob

Goodpath Likelihood when 5 low
-
confidence branches are pending

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February
18
,
2008

Sum ≠ Probability


Hard to choose optimal gate
-
count


Different gate
-
counts for different benchmarks


Different gate
-
counts for different phases



Hard to compare path confidence of diff. apps


SMT Fetch Prioritization sub
-
optimal


Low Conf /
High Conf

(
1
bit)

Path
Confidence

Sum

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Outline


Overview


Motivation


Design


Evaluation


Applications

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Design


Finding ‘correct prediction probability’ for a
branch


MDC table good differentiator of
misprediction

rates


Find
misprediction

rate for each MDC value


No
thresholding
!


Other, more h/w intensive approaches possible









HPCA
-
14

February 18, 2008


Calculate
Goodpath

Probability directly

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Threshold
-
and
-
count vs. PaCo








HPCA
-
Practice Talk

February
13
,
2008

Branch

MDC

Table

Miss Distance
Counter Value

(
4
bits)

Threshold
Function

Low Conf /
High Conf

(1 bit)

Path
Confidence

Sum

Mispredict

Rate

Calculator

(MRT)

Mispredict

Probability

Path
Confidence

Product

PaCo

Mispred

Rate
Table

.

.

.


0


1


2


13


14


15

Misprediction

Rate

MDC Value

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Hardware Complexity


Remove floating point: scale to integer values









HPCA
-
14

February
18
,
2008


Hardware Complexity


Floating point MUL and DIV required


Use logarithms, remove
mul
/div


Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

.

.

.

.

.

.

60 bytes of counters, 10
-
bit shift register


Correct
Preds

Mispreds

Correct
Preds

Mispreds

Correct
Preds

Mispreds

MDC
0

MDC 1

MDC
15

6
bits

Mispredict

Rate Calculator

10
bits

Log

Circuit

.

.

.

.

.

+

Branch

PC

Global

Hist

5

MDC Table

Encoded

Probability

12 bits

MDC
0

MDC
1

MDC 15

+

Path Confidence

Path Confidence Predictor


PaCo Hardware

Feedback from
Backend

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February
18
,
2008

Outline


Overview


Motivation


Design


Evaluation


Applications

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Evaluation: Prediction Accuracy








HPCA
-
14

February
18
,
2008

bzip
2

crafty

gap

gcc

gzip

mcf

parser

perl

twolf

vortex

place

route

Mean

RMS
Error

0.055

0.053

0.087

0.083

0.064

0.045

0.042

0.061

0.018

0.033

0.024

0.032

0.038


RMS Error = 0.038.


Example:


Predicted 60%
goodpath

likelihood


Should be within (60
±

3.8) = 56.2%
-

63.8%


Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Observed Path Confidence in Percent (f)

Predicted Path Confdence in Percent. (x)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Accuracy: Reliability Diagram

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February 18, 2008

Outline


Overview


Motivation


Design


Evaluation


Applications

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Applications: Pipeline Gating








HPCA
-
14

February
18
,
2008

Redn in Badpath Instructions Exec (pct)

Performance Loss (pct)

JRS 3
JRS 7
JRS 11
JRS 15
GateCount
=1

GateCount
=
2

GateCount
=
10

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Applications: Pipeline Gating








HPCA
-
14

February
18
,
2008

Redn in Badpath Instructions Exec (pct)

Performance Loss (pct)

PaCo
JRS 3
JRS 7
JRS 11
JRS 15
0.1
%
perf

loss

32
%
redn
. in
badpath

instructions

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Applications: SMT Fetch Prioritization

IPC (harmonic mean)

jrs3
jrs7
jrs11
jrs15
paco
Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Conclusions


Threshold
-
and
-
Count predictors are inaccurate


PaCo: Directly produces
goodpath

probability


Uses modest h/w by using logarithms


Highly accurate: low RMS error (3.8%)


PaCo does very well in Pipeline Gating and SMT
Fetch Prioritization








HPCA
-
14

February 18, 2008

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Questions?








HPCA
-
14

February 18, 2008

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Backup
1
: Comparison with WPUP



WPUP: Perfect Fetch gating
improves
average
performance by 2.3% (excl.
mcf

and parser)


Badpath

Instructions


Good:
prefetching

(useful: small ROBs, wide machines)


Bad: BTB/Cache pollution (
prob
: smaller BTBs/caches)


Prefetching

much less useful with 512 ROB

WPUP

PaCo

Machine

Width

8

4

ROB Size

128

256

Mem
.

Latency

300

cycles

100 cycles

BTB Size

4K entries

2K

entries

Caches

64K L1, 1MB L2

32K L1,

512KB L2

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Pipeline Parameters: Fetch Gating








HPCA
-
14

February
18
,
2008

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Pipeline Parameters: SMT








HPCA
-
14

February
18
,
2008

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
Practice Talk

February
13
,
2008

Branch

MDC

Table

Miss Distance
Counter Value

(
4
bits)

Threshold
Function

Low Conf /
High Conf

(
1
bit)

Path
Confidence

Sum

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign








HPCA
-
14

February
18
,
2008








HPCA
-
14

February 18, 2008

+

Branch

PC

Global

Hist

5

MDC Table

Mispredict
?

+

1

0

4
bits

Implicitly Parallel Architectures Group

University of Illinois, Urbana
-
Champaign

Applications: SMT Fetch Prioritization

IPC (harmonic mean)

jrs3
jrs7
jrs11
jrs15
paco