Energy Analysis of Re-Injection Based Deadlock Recovery Routing Algorithms

elfinoverwroughtNetworking and Communications

Jul 18, 2012 (5 years and 1 month ago)

302 views

Energy Analysis of Re-Injection Based Deadlock
Recovery Routing Algorithms

H. Kooti
*
, M. Mirza-Aghatabar
*
, S. Hessabi
+

Computer Engineering Department
Sharif University of Technology
Tehran, Iran
*
{kooti, aghatabar}@ce.sharif.edu,
+
hessabi@sharif.edu
A. Tavakkol
School of Computer Science
Institute for Studies in Fundamental Science (IPM)
Tehran, Iran
arasht@ipm.ir


Abstract
—There are two strategies for deadlock handling in
routing algorithms in NoC: deadlock avoidance and deadlock
recovery. Some deadlock recovery routing algorithms are re-
injection based, such as: Compressionless (CR), Software-
Based (SW_TFAR) and AFBAR. In spite of the performance
comparison, none of existing researches have focused on the
energy consumption of various routing algorithms. We
evaluate these routing algorithms according to their energy
consumption and latency. Our experimental results show the
better performance and worse energy consumption of deadlock
recovery routing algorithms comp
ared to deadlock avoidance
routing algorithms. In addition, the best and worst energy
consumption is dedicated to AFBAR and CR, respectively.

I.

I
NTRODUCTION

Deadlock is the most formid
able obstacle that routing
algorithms in wormhole switched interconnection networks
must address and overcome. D
eadlock avoidance has been a
traditional approach in handling deadlock problem, where
routing is restricted in a way that no cyclic dependency exists
between channels. As an alte
rnative approach, deadlock
detection and recovery gained attention, since it imposes no
limit on routing adaptivity.
Schemes based on deadlock avoidance generally suffer
from losses in adaptivity and/or increased hardware
complexity which negatively impact performance. In [1,2], it
was shown that deadlocks ra
rely occur when sufficient
routing freedom is provided. Therefore, it does not make
sense to limit the routing algorithm; hence, recovery schemes
have gained consideration in the scientific community [3-6].
There are some deadlock r
ecovery routing algorithms
such as: CR [3], SW_TFAR [4], AFBAR [5] and Disha [6].
SW_TFAR and AFBAR are progressive while CR is
regressive. In these schemes,
when deadlock happens, the
message will be removed from the network by ejecting it at
the node containing the header flits (SW_TFAR and
AFBAR) or will be killed by the source node (CR). They
will re-inject the removed messages into the network at a
later time.
While network performance analysis due to different
deadlock recovery routing has been studied rigorously in the
past [3-6], network energy analysis has not been explored.
The goal of this paper is to provide a detailed evaluation of
performance and energy of re-injection based deadlock
recovery routing algorithms. In Section 2, we take a look at
related work. Section 3, describes the architecture of our
simulator and presents the evaluation metrics such as latency
and energy consumption. Then, in Section 4, we will present
the experimental results, and finally in Section 5, we
conclude our work and give the summary.
II.

R
ELATED
W
ORK

One re-injection based deadlock recovery algorithm is
Compressionless Routing (CR) proposed in [3]. In CR, the
source node keeps track of the
injected message and detects
if it has reached the destinati
on node or not. No deadlock
may happen if the header flit is delivered to destination;
otherwise, if the message was blocked for some time, the
source breaks down the partial message path, kills the
deadlocked message, and then tries sending it again later.
Another deadlock recovery routing algorithms which is
re-injection based is SW_TFAR, which is introduced in [4].
It has some inevitable advantages
such as: (1) requires a very
small amount of hardware due to no dedicated buffer to
handle deadlocks in compar
ison with Disha [6], (2)
eliminates performance degradation at saturation point with
message injection limitation, (3) uses a new deadlock
detection technique which considerably reduces the
probability of false deadlock detection.
AFBAR is a re-injection based algorithm too, which is an
improved version of SW_TFAR. The authors of [5]
improved the deadlock detection mechanism of SW_TFAR.
In fact, AFBAR decreases the number of false deadlock
detections; consequently, leads to higher performance.
In this paper, we will analyze the re-injection based
deadlock recovery routing algorithms under different
implementations and usage scenarios in terms of their
1
-
4
2
4
4
-
2
5
4
2
-
6
/
0
8
/
$
2
0
.
0
0

©
2
0
0
8

I
E
E
E

(a) (b)

Figure 1. Latency comparison of deadlock recovery and deadlock avoidance routing algorithms
energy dissipation and performance. We will also compare
our results with deadlock avoidance routing algorithms. A
detailed analysis of deadlock avoidance routing algorithms is
studied in [7].
III. S
IMULATOR AND
E
VALUATION
M
ETRICS

The XMulator [8] was used to implement the proposed
routing algorithm and to perform simulation experiments.
XMulator is a complete, flit-level, event-based, and
extensively detailed package for simulation of
interconnection. Orion [9] is used in this paper to measure
the energy consumption of NoC. We have performed our
simulations in torus topology with two different sizes (4×4×4
and 8×8). Three conventional traffic models were selected,
i.e. uniform, local and hotspot. For routing algorithms,
deadlock recovery and deadlock avoidance routing
algorithms were implemented. A fully adaptive deadlock
avoidance routing algorithm needs at least 3 virtual channels
to work properly; while, deadlock recovery routing
algorithms just need 1 virtual channel [10]. Therefore, we
implemented deadlock recovery algorithms with 1 and 3
virtual channels to fairly compare them with deadlock
avoidance routing algorithms.
We consider the latency and energy per flit metrics to
evaluate our experimental results. Latency is defined as the
time that elapses between the occurrence of a message
header injection into the network at the source node,
including the queuing time in source, and reception of the
corresponding tail flit at the destination [10]. Energy
consumption in NoCs consists of two components: the
energy consumed in routers, and the one associated with
links. Both static and dynamic energy consumptions are
considered. We used 65nm library and the clock frequency is
set to 1.5GHz based on the critical path calculations.
IV. E
XPERIMENTAL
R
ESULTS

In this section, first the latency of re-injection based
deadlock recovery algorithms will be presented then we will
compare their latencies with deadlock avoidance routing
algorithms (i.e. Duato and XY). We also analyze the energy
consumption of mentioned routing algorithms under three
traffic models, i.e. uniform, local 40% and hotspot 11%. In
addition, we will analyze the energy consumption of these
routing algorithms in the low traffic regions of the mentioned
traffic models (i.e. local and hotspot) with different
percentages. Our traffic models are combined with uniform
traffic model. As an exemplification, the local40% means
that, 40% of the messages are distributed locally and the
remaining 60% are distributed uniformly.
A. Latency Analysis
Figure 1 shows the latency comparison of deadlock
recovery and avoidance algorithms under hotspot traffic
model. Simulation results under other traffic models were
similar with few exceptions, which we will discuss in this
section. We know that a routing algorithm with more virtual
channels usually gains better performance [10]. It is obvious
from Figure 1 that all deadlock recovery routing algorithms
with 3 virtual channels have a better performance than the
ones with 1 virtual channel under all traffic models. A key
point of this paper is the dependency of latency to the
network’s diameter. The diameter of torus 4×4×4 topology is
6 and the diameter of torus 8×8 topology is 8. Although both
topologies have 64 nodes, we claim that unequal diameter
lengths lead to different latency behaviors in some situations.
AFBAR and SW_TFAR have a better performance than
Duato routing algorithm with 3 virtual channels in a 2
dimensional torus topology under all traffic models. But, in
4×4×4 topology this is not correct and the best latency
belongs to Duato with 3 virtual channels under local and
uniform traffic models. Shorter diameter in 4×4×4 topology
leads to less blocking time; furthermore, each node has 6
input/output ports which enhance the adaptivity of each node
to route packets. On the other hand, the killing or ejection
and re-injection procedures of deadlock recovery routing
algorithms lead to performance degradation. In fact, Duato
routing algorithm dose not suffer from this process and also
benefits from high node adaptivity in 4×4×4 topology which
lead to slightly better performance. Under hotspot traffic
model, due to high traffic around the hot node, the blocking
time is high. Hence, Duato cannot benefit from more
adaptivity of each node in 4×4×4 topology. Figure 1 (a)
shows the better performance of deadlock recovery routing
algorithms with 3 virtual channels in comparison with Duato.
Unlike 4×4×4 topology, in 8×8 topology each node has 4
input/output ports that cause lower adaptivity for routing.
Hence, the more efficient usage of virtual channels in
deadlock recovery routing algorithms leads to their better
Torus 4×4×4, Hotspot 11%
0
100
200
300
400
500
600
0 0.005 0.01 0.015 0.02
Injection Rate
Latenc
y
Duato
XY
SW_TFAR 1VC
SW_TFAR 3VC
AFBAR 1VC
AFBAR 3VC
CR 3VC
CR 1VC
Torus 8×8, Hotspot 11%
0
100
200
300
400
500
600
0 0.005 0.01 0.015 0.02
Injection Rate
Latenc
y
Duato
XY
SW_TFAR 1VC
SW_TFAR 3VC
AFBAR 1VC
AFBAR 3VC
CR 1VC
CR 3VC

(a) (b)


(c) (d)

(e) (f)
Figure 2. Energy consumption comparison of deadlock recovery and deadlock avoidance routing algorithms
performance in comparison with Duato routing algorithm.
In all conditions, the best performance among deadlock
recovery routing algorithms belongs to AFBAR. This gap is
considerable under hotspot traffic model, since AFBAR uses
a more efficient deadlock detection mechanism [5].
Another key point is the worse performance of CR in
comparison with AFBAR and SW_TFAR with 1 and 3
virtual channels in all circumstances, since deadlock
recovery overhead in CR is more than that of the other ones.
B. Energy Analysis
Figure 2 shows the energy consumption of the mentioned
routing algorithms under hotspot, local and uniform traffic
models in torus 4×4×4 and 8×8. The important point that the
authors would like to make is that the worst energy
consumption belongs to CR with 1 virtual channel, since the
worst delay is associated with CR (Figure 1). In addition, CR
deadlock recovery procedure leads to more number of
transferred flits per cycle, which increases the power
consumption as well.
Another important point is that in most cases, the best
energy consumption is gained by Duato routing algorithm
among all routings algorithms. Using three virtual channels
by Duato, results in better performance and lower delay.
Additionally, this algorithm does not kill or re-inject any
packets to the network due to its deadlock avoidance nature.
Hence, this algorithm has lower power consumption as it
would not increase the number of transferred flits per cycle
in comparison with deadlock recovery routing algorithms.
There is an exception were Duato does not have the best
energy consumption. This case appears in torus 4×4×4 with
hotspot traffic (Figure 2 (a)). As aforementioned, in this
topology, the lower diameter reduces the number of blocking
packets and so SW_TFAR and AFBAR will detect lower
number of packets engaging in deadlock cycle. Therefore,
the number of re-injected packets and number of transferred
Torus 4×4×4, Hotspot 11%
2.2E-10
2.3E-10
2.4E-10
2.5E-10
2.6E-10
2.7E-10
2.8E-10
2.9E-10
3E-10
0.0035 0.0055 0.0075 0.0095 0.0115 0.0135 0.0155 0.0175 0.0195
Injection Rate
Energ
y
Duato
XY
SW_TFAR 1VC
SW_TFAR 3VC
AFBAR 1VC
AFBAR 3VC
CR 1VC
CR 3VC
Torus 8×8, Hotspot 11%
2.2E-10
2.4E-10
2.6E-10
2.8E-10
3E-10
3.2E-10
3.4E-10
3.6E-10
0.0015 0.0065 0.0115 0.0165
Injection Rate
Energ
y
Duato
XY
SW_TFAR 1VC
SW_TFAR 3VC
AFBAR 1VC
AFBAR 3VC
CR 1VC
CR 3VC
Torus 4×4×4, Local 40%
1.6E-10
1.65E-10
1.7E-10
1.75E-10
1.8E-10
1.85E-10
0.008 0.01 0.012 0.014 0.016 0.018 0.02 0.022
Injection Rate
Energ
y
Duato
XY
SW_TFAR 1VC
SW_TFAR 3VC
AFBAR 1VC
AFBAR 3VC
CR 1VC
CR 3VC
Torus 8×8, Local 40%
1.6E-10
1.65E-10
1.7E-10
1.75E-10
1.8E-10
1.85E-10
1.9E-10
0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
Injection Rate
Energ
y
Duato
XY
SW_TFAR 1VC
SW_TFAR 3VC
AFBAR 1VC
AFBAR 3VC
CR 1VC
CR 3VC
Torus 4×4×4, Uniform
2E-10
2.1E-10
2.2E-10
2.3E-10
2.4E-10
2.5E-10
0.01 0.012 0.014 0.016 0.018 0.02
Injection Rate
Energ
y
Duato
XY
SW_TFAR 1VC
SW_TFAR 3VC
AFBAR 1VC
AFBAR 3VC
CR 1VC
CR 3VC
Torus 8×8, Uniform
2.1E-10
2.2E-10
2.3E-10
2.4E-10
2.5E-10
2.6E-10
2.7E-10
2.8E-10
2.9E-10
3E-10
3.1E-10
0.0045 0.0065 0.0085 0.0105 0.0125 0.0145 0.0165 0.0185
Injection Rate
Energ
y
Duato
XY
SW_TFAR 1VC
SW_TFAR 3VC
AFBAR 1VC
AFBAR 3VC
CR 1VC
CR 3VC

Figure 3. Energy consumption of deadlock recovery and deadlock avoidance routing algorithms at different traffic percentages
flits per cycle will reduce which leads to lower power delay
product or energy consumption.
Another point is that deadlock recovery routing
algorithms with 1 virtual channel consume more energy
compared to deadlock recovery routing algorithms with 3
virtual channels in all cases. Although more virtual channels
cause more power consumption, it does not increase energy
consumption. It means more efficient usage of virtual
channels by deadlock recovery routing algorithms, which
leads to better performance, overcomes the more power
consumption based on more virtual channels.
Our experimental results show that less energy is
consumed by AFBAR than SW_TFAR. The energy
consumption gap for AFBAR and SW_TFAR is more
obvious with 1 virtual channel than 3 virtual channels, as the
better deadlock detection mechanism of AFBAR is more
efficient in lower number of virtual channels.
We consider energy consumption of these routing
algorithms in low traffic under local and hotspot traffic
models with different percentages. Figure 3 shows the
results. The interesting result is that by increasing the
percentage of local traffic, the energy consumption will
decrease. This is due to the fact that locality increment
reduces the average distance, and hence, leads to lower delay
and lower power-delay product or energy consumption. On
the other hand, by increasing the percentage of hotspot
traffic, the energy consumption will increase. This is due to
the fact that more blocking time leads to more channel
monitoring [11] which increases the power consumption, and
hence, leads to more energy consumption.
V. S
UMMARY AND
C
ONCLUSIONS

In this paper, we considered the deadlock recovery routing
algorithms. We compared them with each other and with
deadlock avoidance routing algorithms from latency and
energy consumption points of view.
AFBAR has the best performance among all routing
algorithms, while CR has the worst performance. Another
important result was that increasing the number of virtual
channels in deadlock recovery routing algorithms leads to
lower energy consumption. Duato has the best energy
consumption due to its good performance and power
consumption. In addition, high percentage of hotspot traffic
model leads to more energy consumption; while, high
percentage of local traffic model leads to lower energy
consumption.
As a conclusion, we can say that whenever the energy
consumption is a critical parameter for a designer, the Duato
deadlock avoidance routing algorithm is a better selection,
and whenever the performance or delay is the critical
parameter, the AFBAR deadlock recovery routing algorithm
is the best choice.
R
EFERENCES

[1] T.M. Pinkston and S. Warnakulasuriya, “On deadlocks in
interconnection networks”, the 24
th
International Symposium on
Computer Architecture, June 1997
[2] S. Warnakulasuriya and T.M. Pinkston, “Characterization of
deadlocks in interconnection networks,” In Proc. of the 11
th

International Parallel Processing Symposium, April 1997
[3] J. Kim, Z. Liu and A. Chien, “Compressionless Routing: a framework
for adaptive and fault-tolerant routing,” In Proc. of the 21
st

International Symposium on Computer Architecture, pp. 289-300,
April 1994
[4] J.M. Martinez, P. Lopez, J. Duato and T.M. Pinkston, “Software-
Based deadlock recovery technique for true fully adaptive routing in
wormhole networks,” 1997 International Conference Parallel
Processing, August. 1997
[5] M. Mirza-Aghatabar, A. Tavakkol, H. Sarbazi Azad, "An adaptive
software-based deadlock recovery technique," IEEE International
Conference on Advanced Information Networking and Applications,
pp. 514-519, March 2008
[6] K. V. Anjan and T. M. Pinkston, “DISHA: a deadlock recovery
scheme for fully adaptive routing,” In Proc. of the 9
th
International
Parallel Processing Symposium, pp. 537-543, April 1995
[7] M. Mirza-Aghatabar, S. Koohi, S. Hessabi, and Massoud Pedram,
"An empirical investigation of mesh and torus NoC topologies under
different routing algorithms and traffic models," in Proceedings of
the 10
th
IEEE Euromicro Conference on Digital System Design, pp.
19-26, 2007
[8] A. Nayebi, S. Meraji, A. Shamaei, H. Sarbazi-Azad, "XMulator: an
object oriented XML-Based Simulator," in Asia International
Conference on Modeling & Simulation, pp. 128–132, 2007
[9] H. S. Wang, X. Zhu, L. S. Peh and S. Malik, “Orion: a power-
performance simulator for interconnection networks,” In Proceedings
of MICRO 35, Istanbul, Turkey, November 2002
[10] L. Ni and C. Glass, “The Turn Model for adaptive routing,” In Proc.
of the 19
th
International Symposium on Computer Architecture, IEEE
Computer Society, Vol. 20, No. 2, pp. 278-287, May, 1992
[11] S. Koohi, M. Mirza-Aghatabar, S. Hessabi, “Evaluation of traffic
pattern effect on power consumption in mesh and torus-based
Network-on-Chips,” International Symposium on Integrated Circuits,
Singapore, 2007
0
2E-10
4E-10
6E-10
8E-10
1E-09
1.2E-09
1.4E-09
0 10 20 30 40 50 60 70 80
Energy
Traffic Percentage
Torus, Hotspot
Duato(4×4×4)
Duato(8×8)
SW_TFAR(4×4×4)
SW_TFAR(8×8)
AFBAR(4×4×4)
AFBAR(8×8)
2E-10
2.5E-10
3E-10
3.5E-10
4E-10
4.5E-10
0 20 40 60 80 100
Energy
Traffic Percentage
Torus, Local
Duato(4×4×4)
Duato(8×8)
SW_TFAR(4×4×4)
SW_TFAR(8×8)
AFBAR(4×4×4)
AFBAR(8×8)