Prediction of CPU
Idle

Busy Activity Pattern
Author: Qian Diao, Justin Song
Presented by: Justin Song
Intel Corporation
14
th
International Symposium on
High

Performance Computer Architecture
Salt Lake City, UT

Feb 18, 2008
2
Agenda
•
Introduction
•
Usage model
•
CPU idle pattern
•
Prediction algorithm
•
Result
•
Benefit analysis
•
Summary & future work
3
Problem
•
C

state: CPU idle state (no instr being executed)
•
C

state based CPU power management: potentially big benefit
•
Workloads rarely saturate multi

core CPU
•
C

state technology being matured
–
lower power, higher compute efficiency, Si support
•
How to use C

state: broken
•
Today: only OSPM selects C

state for logical CPU (core/thread)
•
A lot of wrong decisions
–
performance regression, or power waste
•
Performance concern may prevent deep C

state enabling
ACPI table
Linux C

state policy
Case last C:
C1: 4 consecutive idles > C2.lat,
choose C2 for next C
C2: 10 consecutive idles >
C3.lat, choose C3 for next C
C2/C3: last idle < C2/C3.lat,
demote
4
GOOD Prediction Helps
•
No worry for perf drop
•
Possible causes for deep C

state to degrade perf
–
Coming C0% too high (e.g. >90%); no headroom to
accommodate deep C
•
Equivalent statement: coming idle duration too short; deep C’s
latency cannot be amortized)
–
Deep C, under some circumstance, prevents proprietary Si
optimization for perf compensation from working
–
Thread context loss
–
On

core/pkg cache flush
•
Deep C

state’s power benefit maximized
5
Our Methodology
•
Modeling problem
•
Use easy

to

observe metrics
•
Need domain knowledge assistance (Si PM optimization)
•
Prediction Model: DBN (Dynamic Bayesian Networks)
•
Generalization of HMM and LDS (KFM)
•
Combine natural mechanism for expressing domain knowledge
with efficient algorithms for learning and inference
•
Model evaluation
•
Model simplification
•
For deployment in SW/FW/HW
•
Power benefit / perf impact quantification
6
Agenda
•
Introduction
•
Usage model
•
CPU idle pattern
•
Prediction algorithm
•
Result
•
Benefit analysis
•
Summary & future work
7
Usage Model
Use activity prediction result to direct C

state usage and performance
compensation
8
Agenda
•
Introduction
•
Usage model
•
CPU idle pattern
•
Prediction algorithm
•
Result
•
Benefit analysis
•
Summary & future work
9
CPU Package Activity State
•
Package activity
–
all cores idle

busy activity
•
All

core

idle
•
All

core

busy
•
Package partial idle (at least one core idle and one core busy)
–
All other 2^N

2 states (N=# of cores)
•
How PM benefits from the definition
•
Idle

busy (not OSPM selected C

state) pattern reflects workload timing
nature
•
Aligned with shared

power

lane design
–
Only when all cores are idle, package’s mem and I/O control logic can go to
lower power state
–
Only when at least one core idle, active cores’ performance can only be possibly
compensated
–
Break

down of package partial idle
core location information
Quad

core CPU
package activity state
change over time
10
CPU Idle Pattern
•
Definition: residency% of each package activity state during an
observation time slot
•
How prediction benefits from the definition
•
Prediction of package idle pattern: random variable becomes
discrete
–
Prediction of idle duration: hard to use discrete prediction model
•
Single

core’s idle duration prediction cannot help the whole CPU
package power saving and performance compensation
–
Hard to know if cores’ idles overlap
Woodcrest activity pattern time distribution
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
time slot
%
activitystate (0,0)
activitystate (0,1)
activitystate (1,0)
activitystate (1,1)
Dual

core CPU package
activity pattern over time
11
Agenda
•
Introduction
•
Usage model
•
CPU idle pattern
•
Prediction algorithm
•
Result
•
Benefit analysis
•
Summary & future work
12
Prediction Algorithm
•
Kalman Filter Model used for prediction
•
Time series (observed CPU package patterns) is Markov process
•
Observation made every 500us
•
KFM generalized in Dynamic Bayesian Networks
–
Explicit probability definition (Bayesian theory)
–
Good network structure description (graph theory)
•
Algorithm
•
Inputs
–
T observed history CPU package patterns. Each state
’
s percentage series is
defined as an independent variable.
–
A

priori state transition, deviation, observation covariance
•
Interim outputs
–
Hidden conditional probability distribution
•
Final outputs
–
Prediction for (T+1)th CPU package pattern
•
Inference
–
Forward operator (1 to T)
–
Backward operator (T+1 backto 1)
•
Complexity (T: # history observations; N: # of activity states)
–
O(TN^3)
13
Algorithm Simplification
•
2^N states
3 states (all busy, all idle, partial idle)
•
One step forward and backward computation
•
Forward: storing (T

1)
’
s intermediate results
•
Backward: just compute (T+1)
•
Complexity of simplified algorithm
•
Best case: O(1)
•
Worst case: O(T), when need to discard history intermediate
results and start over
Co

processor based
prediction time estimate
14
Agenda
•
Introduction
•
Usage model
•
CPU idle pattern
•
Prediction algorithm
•
Result
•
Benefit analysis
•
Summary & future work
15
Result
Smoothed follows
observed very well
Predicted value
Distance from
grand truth is
prediction error
Grand

truth value
For DP CPU, 4 variables: (busy,busy)%, (busy,
idle)%, (idle, busy)%, (idle, idle)%; 3 of them are
independent; no aggregation for partial idles
16
Result
–
Cont’d
Data sets
Dual core
Quad core
Partial idles
aggregated
on dual core
Partial idles
aggregated
on quad
core
State
number
4
16
3
3
Means
square error
of prediction
0.0387
0.0395
0.0204
0.0033
Relative
error
4.08%
4.34%
2.15%
0.36%
All states prediction: useful for location aware optimization
All

busy, all

idle, partial

idle prediction: useful for shared power plane optimization
17
Agenda
•
Introduction
•
Usage model
•
CPU idle pattern
•
Prediction algorithm
•
Result
•
Benefit analysis
•
Summary & future work
18
Benefit Analysis Method
•
Tracing idle

busy events on real quad

core processor
•
Simulate OSPM C

state decision making (baseline)
•
Simulate C

state decision based on prediction result
•
Prediction error injected
•
Cycle

by

cycle C

state’s power and transition energy
accumulated
•
Accumulated energy / run time = average power
•
Compare prediction based c

state selection against OSPM
baseline
19
Benefit Result
Linux C

state
policy
Our prediction
Improvement
% wrong
decisions*
26%
4.3%
Wrong decisions
reduced by
83%
Average CPU
power
3492mW**
3064mW**
Additional
power saving
428mW**
Performance
impact
2.1%***
0.09%***
Performance
improved by
2%***
*: power delta < transition energy (if OSPM selects C2/C3) or idle length > C2/C3 latency (if OSPM selects
C1)
**: based on power numbers in figure 1 (source: ACPI spec [1]). It doesn’t represent real power of our
experimentation processor.
***: based on latency numbers in figure 1 (source: ACPI spec [1]). It doesn’t represent real latency of our
experimentation processor.
20
Agenda
•
Introduction
•
Usage model
•
CPU idle pattern
•
Prediction algorithm
•
Result
•
Benefit analysis
•
Summary & future work
21
Summary & Future Work
•
Good problem modeling and prediction is
key for fully taking advantage of deep C

state’s power benefit
•
KFM model works for CPU package pattern
prediction for SPECWeb
•
To evaluate more workloads with more
general assumptions
22
Q & A
Comments 0
Log in to post a comment