PRACTICAL DYNAMIC THERMAL MANAGEMENT ON INTEL DESKTOP COMPUTER

basketontarioΗλεκτρονική - Συσκευές

2 Νοε 2013 (πριν από 3 χρόνια και 5 μήνες)

42 εμφανίσεις

PRACTICAL DYNAMIC THERMAL MANAGEMENT
ON INTEL DESKTOP COMPUTER

Guanglei
Liu

Department of Electrical and Computer Engineering

Florida International University

July
12, 2012


Major Professor: Dr. Gang
Quan

Thermal Design Challenges

Figure from Intel Microprocessor Technology Lab, 2011

Number

of

transistors

keeps

increasing



Nearly 40 billon transistors are
integrated into single die
[
Mizunuma
, 2009
ICCAD]


More

complicated

architectures

are

built


80 core single chip processor has
been demonstrated by Intel
[
Vangal
,
2007 ISSCC]

Environmental

concerns


In U.S, 46% of electricity is generated by
fossil fuels.

Electric

Bill


U.S. Datacenters: 120 billon kilowatt
hours in
2012


9
billion dollar, 15% of all energy in U.S.


High transistor density increases power density

High power density
brings
up the on
-
chip temperatures and
causes
thermal
issues

Source: Environmental Protection Agency (EPA) Report

Thermal Issues

Increase package/cooling costs


1
-
3 dollar per wa
tt
[
Skadron
, ICSA 2003]


Data center, each watt on computing, ½
-

1 watt for
cooling

[Brill, 2007]

Affect reliability


As much as 50% reduction of device’s life span for
every 10
o
C increase
[Yeo, DAC 2008]


Degrade
performance


10
-
15% more circuit delay for each 15
o
C increase
[
Santarini
, EDN 2005]

Crush the computing system


Processor’s self
-
protect mechanism automatically
shuts down processor to avoid physical
damage
[
Rohou
, WFDO 1999]

Increase Leakage power consumption


Temperature
from 65
o
C to 110
o
C can increase
the leakage power by 38% for IC
circuits.
[
Santarini
, EDN 2005]

Computing system cooling solutions

Mechanical Cooling Solution

Air
-
cooling (e.g. fan + heat sink)



Cooling cost takes
51%

of overall server power
budget
[
Lefurgy
, COM 2003]


Noise level increases
10dB

as fan speed increases
by 50%
[Lyon, STMMS 2004]

Liquid
-
cooling


High density liquid absorb
3500

times more
heat than air
[Chu, DMR 2004]


High cooling cost

Dynamic Thermal Management (DTM)


Dynamic voltage and frequency scaling
(
DVFS) technique

[
Kim,
HPCA 2008
]


Task migration
[Lim QED 2002
]


Clock gating
[Gunther, ITJ 2001
]


Fetch toggling
[Brooks, HPCA 2001
]


Sacrifice system
performance

Related
Theoretical
Work

Our Research Goal
:

To develop up a practical hardware platform that enables us to investigate the
limitations of the existing theoretical work, and develop practical and effective
DTM techniques to accommodate those limitations

Those theoretical work are derived based on
simplified
mathematical
thermal
models
and idealized assumptions

Thermal
-
aware throughput maximization

[
Chantem

et al., ISLPED 2009]

[Zhang et al., ICCAD 2007]

[
Chatha

et al., DAC 2010]

Peak temperature minimization

[
Chaturvedi

et al., ASPDAC 2011]

[Liu et al., RTAS 2010]

[
Qiu

et
al.,
ICESS 2010]

Overall energy reduction under peak
temperature constraints

[
Bao

et
al.,
DATE 2010
]

[Andrei et al., DAC 2009]

[Huang et
al.,
DATE 2011]

Real
-
time guarantee under peak temperature
constraint

[
Chaturvedi

et al., CIT 2010]

[Wang et al., RTS 2006]

[Huang et
al.,
RTSS 2009]

Thermal management
validation

[SUSCOM 2012]



DTM techniques VS air
-
cooling



DTM
vs

DPM algorithm


F
undamental DTM principles
validation

R
eactive DTM

Single
-
core


Limitations of theoretical
works



Non
-
constant sampling
period



Thermal profiling analysis

[
GreenCom

2012]

Major contributions

Practical hardware platform


Intel i5 Quad core


Linux operating system

[
SouthEast

2011]

Proactive DTM algorithm

Multi
-
core

[DATE 2012]

[ASP2012]



Neighbor
-
aware temperature
prediction



Algorithm for multicore with
task migration

Practical Hardware Platform

CoreTemp

driver

Read on
-
chip thermal
sensor

Lm
-
sensors
Tool

Monitor system
information

Cpufreq

module

12 different speed
levels

Fancontrol

shell script

Manually adjust fan
speed

Intel i5 quad core

Temperature
capturing

SPEC Benchmark

DVFS Technique

Fan Speed Control

Computing system hardware monitoring tool

Temperature
value

Fan Speed

Voltag
e
value

Fan control

DVFS technique

DVFS technique

DVFS technique

Power
measureme
nt

Task

migration

CPU_affinity

module

Migrate process
between cores

Dell Precision T1500
workstation

Linux
kernel
version of
2.6.23

SPEC CPU2000
Benchmark

Integers and floating
point operations

Fluke
current clamp
,

Multimeter

Cooling/ CPU power
consumption

Our Approach

Enhanced reactive DTM (ERDTM)

B
uild up a temperature vs. speed lookup table


R
un benchmarks with different speed
levels


Collect corresponding peak
temperatures

Offline thermal profiling
analysis

Buffer zone and safe region

Buffer zone:

Safe
region:

Time

Temperature

Safe region

Buffer zone

T
safe

T

TURESHOLD




is maximum
possible temperature
increment
4
o
C


Experimental results


Four identical tasks assigned to four cores to
simulate single
-
core environment


Temperature threshold is
55
o
C


Construct the lookup table offline

Frequency lookup table

Experiment setup

FSDTM algorithm

VS
-
DTM algorithm

ERDTM algorithm

Number of violations

87

Number of violations

12

Number of violations

0

DTM algorithm Performance evaluation

0.96
0.98
1
1.02
1.04
1.06
1.08
1.1
galgel
ammp
lucas
equake
vpr
gcc
parser
crafty
Throughput (%)

SPEC CPU2000 Benchmark

FSDTM
VS-DTM
ERDTM
ERDTM average throughput improvement is
8.1%


Neighbor
-
aware temperature prediction

Our Neighbor
-
aware prediction

where and are weights, which are
obtained by collecting
training data

Obtained
offline

Individual increment factor

Processor temperature
increment

Neighbor increment
factor

Heat transfer from neighbor processor

Training process

Apply least
-
square
estimation

Run the tasks and
record
temperature information

Neighbor
-
aware Task
M
igration

Always migrate task from hottest core to
the coolest core.

Conventional approach
:

NADTM Algorithm

Predict thermal
emergency

Migrate task

DVFS technique

Heat

factor
: to evaluate the processor hotness

Increasing factor
: to evaluate the temperature
increment

Our migration strategy

choose the migration candidate
with the
minimum

Performance analysis

Single task

Multiple task


NADTM algorithm can effectively
control the temperature under the
threshold


It
has a small temperature
oscillation of
1
o
C

An average of 3.6%
overall throughput
improvement

An
average of 5.8%
overall throughput
improvement

Thank You for Your Attention !

Journals


Peer Reviewed Conferences


1.
Guanglei
Liu, M. Fan, G.
Quan
, M.
Qiu

“On
-
Line Predictive Thermal Management under Peak Temperature
Constraints for Practical Multi
-
core Platforms”, Journal of Low Power Electronics (ASP). (under review),
2012
.

2.
Guanglei Liu, G.
Quan
, M.
Qiu

“Practical Dynamic Thermal Management on An Intel Desktop Computer ” ,
Embedded Software Design, Journal of Sustainable Computing (SUSCOM) (under review), 2012.

3.
H
. Huang, V.
Chaturvedi
, Guanglei Liu, G.
Quan
, ”Leakage Aware Scheduling On Maximum Temperature
Minimization For Periodic Hard Real
-
Time Systems”, Journal of Low Power Electronics (ASP
),
2012.


1.
Guanglei
Liu, M. Fan, G.
Quan
, “Neighbor
-
Aware Dynamic Thermal Management for Multi
-
core Platform”,
The 15th Design, Automation, and Test in Europe (DATE 2012), Dresden, Germany, March 12
-
16, 2012
.

2.
Guanglei
Liu, G.
Quan
, M.
Qiu
, “The Practical On
-
line Scheduling for Throughput Maximization on Intel
Desktop Platform under the Maximum Temperature Constraint“, The 2011 IEEE/ACM Green Computing and
Communications (
GreenCom

2011), Sichuan, China, August 4
-
5, 2011
.

3.
Guanglei
Liu, G.
Quan
, ”Thermal Aware Scheduling on an Intel Desktop Computer,” IEEE
SouthEast

Conference (
SouthEast

2011), Nashville, Tennessee, March 17
-
20, 2011
.

4.
Guanglei
Liu, J. Fan, “Framework for Statistical Analysis of Homogeneous Multi
-

core Power Grid Networks“,
IEEE 8th International Conference on ASIC (ASICON 2009), Changsha, China, October 20
-
23, 2009
.

5.
C
. Liu, J. Tan, R. Chen, Guanglei Liu, J. Fan, “Thermal Aware
Clocktree

Optimization in Nanometer VLSI
Systems Considering Temperature Variations“, IEEE 40th
Southeastern
Symposium on System Theory (SSST
2008), New Orleans, LA, March 17
-
18, 2008.