Idle-Busy Activity Pattern

lettuceescargatoireΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 5 μέρες)

87 εμφανίσεις

Prediction of CPU

Idle
-
Busy Activity Pattern


Author: Qian Diao, Justin Song


Presented by: Justin Song

Intel Corporation

14
th

International Symposium on

High
-
Performance Computer Architecture

Salt Lake City, UT
-

Feb 18, 2008

2

Agenda


Introduction


Usage model


CPU idle pattern


Prediction algorithm


Result


Benefit analysis


Summary & future work

3

Problem


C
-
state: CPU idle state (no instr being executed)


C
-
state based CPU power management: potentially big benefit


Workloads rarely saturate multi
-
core CPU


C
-
state technology being matured


lower power, higher compute efficiency, Si support


How to use C
-
state: broken


Today: only OSPM selects C
-
state for logical CPU (core/thread)


A lot of wrong decisions


performance regression, or power waste


Performance concern may prevent deep C
-
state enabling

ACPI table

Linux C
-
state policy

Case last C:


C1: 4 consecutive idles > C2.lat,
choose C2 for next C


C2: 10 consecutive idles >
C3.lat, choose C3 for next C


C2/C3: last idle < C2/C3.lat,
demote

4

GOOD Prediction Helps


No worry for perf drop


Possible causes for deep C
-
state to degrade perf


Coming C0% too high (e.g. >90%); no headroom to
accommodate deep C


Equivalent statement: coming idle duration too short; deep C’s
latency cannot be amortized)


Deep C, under some circumstance, prevents proprietary Si
optimization for perf compensation from working


Thread context loss


On
-
core/pkg cache flush


Deep C
-
state’s power benefit maximized

5

Our Methodology


Modeling problem


Use easy
-
to
-
observe metrics


Need domain knowledge assistance (Si PM optimization)


Prediction Model: DBN (Dynamic Bayesian Networks)


Generalization of HMM and LDS (KFM)


Combine natural mechanism for expressing domain knowledge
with efficient algorithms for learning and inference


Model evaluation


Model simplification


For deployment in SW/FW/HW


Power benefit / perf impact quantification

6

Agenda


Introduction


Usage model


CPU idle pattern


Prediction algorithm


Result


Benefit analysis


Summary & future work

7

Usage Model

Use activity prediction result to direct C
-
state usage and performance
compensation

8

Agenda


Introduction


Usage model


CPU idle pattern


Prediction algorithm


Result


Benefit analysis


Summary & future work

9

CPU Package Activity State


Package activity


all cores idle
-
busy activity


All
-
core
-
idle


All
-
core
-
busy


Package partial idle (at least one core idle and one core busy)


All other 2^N
-
2 states (N=# of cores)


How PM benefits from the definition


Idle
-
busy (not OSPM selected C
-
state) pattern reflects workload timing
nature


Aligned with shared
-
power
-
lane design


Only when all cores are idle, package’s mem and I/O control logic can go to
lower power state


Only when at least one core idle, active cores’ performance can only be possibly
compensated


Break
-
down of package partial idle


core location information

Quad
-
core CPU
package activity state
change over time

10

CPU Idle Pattern


Definition: residency% of each package activity state during an
observation time slot


How prediction benefits from the definition


Prediction of package idle pattern: random variable becomes
discrete


Prediction of idle duration: hard to use discrete prediction model


Single
-
core’s idle duration prediction cannot help the whole CPU
package power saving and performance compensation


Hard to know if cores’ idles overlap


Woodcrest activity pattern time distribution
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
time slot
%
activity-state (0,0)
activity-state (0,1)
activity-state (1,0)
activity-state (1,1)
Dual
-
core CPU package
activity pattern over time

11

Agenda


Introduction


Usage model


CPU idle pattern


Prediction algorithm


Result


Benefit analysis


Summary & future work

12

Prediction Algorithm


Kalman Filter Model used for prediction


Time series (observed CPU package patterns) is Markov process


Observation made every 500us


KFM generalized in Dynamic Bayesian Networks


Explicit probability definition (Bayesian theory)


Good network structure description (graph theory)


Algorithm


Inputs


T observed history CPU package patterns. Each state

s percentage series is
defined as an independent variable.


A
-
priori state transition, deviation, observation covariance


Interim outputs


Hidden conditional probability distribution


Final outputs


Prediction for (T+1)th CPU package pattern


Inference


Forward operator (1 to T)


Backward operator (T+1 backto 1)


Complexity (T: # history observations; N: # of activity states)


O(TN^3)

13

Algorithm Simplification


2^N states


3 states (all busy, all idle, partial idle)


One step forward and backward computation


Forward: storing (T
-
1)

s intermediate results


Backward: just compute (T+1)


Complexity of simplified algorithm


Best case: O(1)


Worst case: O(T), when need to discard history intermediate
results and start over

Co
-
processor based
prediction time estimate

14

Agenda


Introduction


Usage model


CPU idle pattern


Prediction algorithm


Result


Benefit analysis


Summary & future work

15

Result

Smoothed follows
observed very well

Predicted value

Distance from
grand truth is
prediction error

Grand
-
truth value

For DP CPU, 4 variables: (busy,busy)%, (busy,
idle)%, (idle, busy)%, (idle, idle)%; 3 of them are
independent; no aggregation for partial idles

16

Result


Cont’d

Data sets

Dual core

Quad core

Partial idles
aggregated
on dual core

Partial idles
aggregated
on quad
core

State
number

4

16

3

3

Means
square error
of prediction

0.0387

0.0395

0.0204

0.0033

Relative
error

4.08%

4.34%

2.15%

0.36%

All states prediction: useful for location aware optimization

All
-
busy, all
-
idle, partial
-
idle prediction: useful for shared power plane optimization

17

Agenda


Introduction


Usage model


CPU idle pattern


Prediction algorithm


Result


Benefit analysis


Summary & future work

18

Benefit Analysis Method


Tracing idle
-
busy events on real quad
-
core processor


Simulate OSPM C
-
state decision making (baseline)


Simulate C
-
state decision based on prediction result


Prediction error injected


Cycle
-
by
-
cycle C
-
state’s power and transition energy
accumulated


Accumulated energy / run time = average power


Compare prediction based c
-
state selection against OSPM
baseline

19

Benefit Result

Linux C
-
state
policy

Our prediction

Improvement

% wrong
decisions*

26%

4.3%

Wrong decisions
reduced by
83%

Average CPU
power

3492mW**

3064mW**

Additional
power saving
428mW**

Performance
impact

2.1%***

0.09%***

Performance
improved by
2%***

*: power delta < transition energy (if OSPM selects C2/C3) or idle length > C2/C3 latency (if OSPM selects
C1)

**: based on power numbers in figure 1 (source: ACPI spec [1]). It doesn’t represent real power of our
experimentation processor.

***: based on latency numbers in figure 1 (source: ACPI spec [1]). It doesn’t represent real latency of our
experimentation processor.

20

Agenda


Introduction


Usage model


CPU idle pattern


Prediction algorithm


Result


Benefit analysis


Summary & future work

21

Summary & Future Work


Good problem modeling and prediction is
key for fully taking advantage of deep C
-
state’s power benefit


KFM model works for CPU package pattern
prediction for SPECWeb


To evaluate more workloads with more
general assumptions

22

Q & A