Deadlock Free Routing Algorithms for Mesh Topology NoC Systems with Regions

elfinoverwroughtNetworking and Communications

Jul 18, 2012 (5 years and 4 months ago)

478 views

Deadlock Free Routing Algorithms for
Mesh Topology NoC Systems with Regions


1
Rickard Holsmark,
2
Maurizio Palesi and
1
Shashi Kumar
1
Jönköping University, Sweden
2
DIIT, University of Catania, Italy
1
{rickard.holsmark, shashi.kumar}@ing.hj.se
2
mpalesi@diit.unict.it


Abstract

Region concept helps to accommodate cores larger
than the tile size in mesh topology NoC architectures.
In addition, it offers many new opportunities for NoC
design, as well as provides new design issues and
challenges. The most important among these is the
design of a deadlock free routing algorithm. In this
paper, we present and compare two routing algorithms
for mesh topology NoC with regions. The first
algorithm is borrowed from the area of fault tolerant
networks and is adapted for the NoC context. We
compare this with an algorithm designed using a
methodology for design of application specific routing
algorithms for communication networks. Our study
shows that the application specific routing algorithm
not only provides much higher adaptivity, but also
superior performance as compared to the other
algorithm in all traffic cases.

Keywords: Routing Algorithms, Networks on Chip,
Deadlock, Wormhole Switching, Application Specific
Routing

1. Introduction

Network on Chip (NoC) is slowly being accepted
as an important paradigm for implementing
communication among various cores in a SoC.
Network topology and routing algorithms are the two
most important aspects which distinguish various
proposed NoC architectures [1,2,3,4]. Fixed tile size
based two dimensional mesh topology is favored by
many research groups because of its layout efficiency,
good electrical properties and simplicity in addressing
on-chip resources. Such a physically homogeneous
network is not efficient for incorporating cores of
different sizes in the network. In such a network, the
tile size should be able accommodate the physically
largest core, such as a shared memory. It will also be
hard to reuse earlier designed multi-core sub-systems
within a fixed tile size based NoC. To overcome these
problems the concept of a region was proposed in [1].
This concept allows a rectangular area, larger than a
tile, in the mesh to be declared as a region. The region
is isolated from the outside network using a wrapper as
shown in Fig. 1.
In a NoC system with regions, routing of packets
becomes more complex. Some network routers are
removed from the mesh network to accommodate a
large region. In effect, a region acts as an obstacle to
the network traffic. This not only results in higher
packet latency, but deadlock free routing algorithms
designed for regular mesh network are no more usable.
Wormhole switching used in communication
networks is proposed by several researchers, e.g. [3,4]
as most suitable for on-chip communication. A
drawback with this switching technique is the
increased possibility of deadlocks. To solve the
problem of deadlock, many algorithms have been
proposed for mesh topology networks in literature. For
example, the simple X-Y routing algorithm and Turn-
model based [5] algorithms like west-first, are
deadlock free in mesh networks. However, none of
these can be used for meshes with regions as messages
cannot get around these because of the restrictions on
the allowed turns.



Fig. 1. Region within a mesh topology NoC


Bolotin et al. [3] have also proposed non-
homogeneous mesh topology NoC allowing
rectangular cores larger than the mesh tile. Their
solution to deadlock free routing is to use X-Y routing
extended with hard coded paths for region affected
traffic.
A problem similar to regions occurs when
designing fault-tolerant routing algorithms for mesh
networks. Several of these algorithms consider faults to
be contained in rectangular blocks similar to regions.
Normal
Sized
NoC Tile
NoC
Router
Re
g
ion
Wrapper
Region
In this category, virtual channels [6] have been used to
facilitate design of such algorithms [7]. Still, the use of
virtual channels adds resources and increase design
complexity. Some researchers have proposed fault
tolerant algorithms without the use of virtual channels.
These are based on non-adaptive routing algorithms
that are modified to work in the presence of faults or
regions. In [8] they use modified X-Y routing to route
around faulty blocks, but also impose some
restrictions. In [9] an algorithm that is less restricted
was proposed.
Duato [10] has proposed a general theory to
develop highly adaptive deadlock free routing
algorithms for a general communication network
which uses wormhole switching technique. The basic
idea in Duato’s theory is to identify a set of
consecutive communication channels in the network
which if used concurrently can cause a deadlock
situation. The solution is to prevent this situation.
Most of the deadlock free routing algorithms
proposed in literature are general purpose and have
been designed to handle worst case communication
patterns in the network. A NoC system specialized for
a set of applications can be regarded as a semi-static
system. Here we can have the information about the set
of pairs of cores which communicate and other pairs
which never communicate. This information about the
communication topology can be incorporated in
Duato’s theory to design highly adaptive routing
algorithms. We call such algorithms as Application
Specific Routing Algorithms (APSRAs) [11]. APSRA
has not yet been used for development of deadlock free
routing in mesh NoC with regions.

2. Region Concept and New Design Issues

The region concept presented in [1] was intended
for use of larger resources, which do not fit in the fixed
sized slot of a regular mesh architecture layout. Region
concept could also be useful for encapsulating a group
of resources which have very high and special
communication requirements which can not be
supported by the general NoC communication
infrastructure. Within such a region, one could have
specialized interconnections as well as communication
protocols for achieving the required performance. One
can also think about encapsulation of a group of
resources as a region for special requirements such as
low power consumption or data security.
Above applications of region may seem to imply
that the region structure has to be physically different
in design from its surroundings. That is however not
necessary; it is also possible to think of the region as a
logical structure. In this case the internal hardware
design of the region is identical with the outside NoC
structure but is somehow isolated from the surrounding
network. This assumes that there are configurable
routers in the NoC that can be used for defining and
maintaining a region.
We feel that reuse of multi-core subsystems will
become a very important application of the region
concept in the near future. For example, multi-media
solutions currently available as separate SoCs can be
reused. It is unlikely that these subsystems will
physically fit in the general slot for a core in the mesh
NoC. Without the region concept the subsystem will
need to be redesigned keeping in view the NoC
constraints. The effort required to redesign may be too
high, or the redesigned subsystem may not be able to
achieve the required performance in the NoC context.

2.1. Routing in NoC with Regions

Efficient routing of messages within the network is
essential in order to fully exploit the power of the
computing resources and achieve good performance
for applications running on them. A good routing
algorithm should not only provide low latency for
messages but should also be deadlock free when the
network is concurrently routing multiple messages.
However, incorporating regions in mesh networks
result in a major change of the communication
infrastructure and the existing mesh routing algorithms
cannot be directly reused.
In addition to creating problems of deadlock
freedom, regions also affects the traffic distribution in
the network. Traffic flows which get obstructed by the
region have to circumvent it in order to make progress.
This could make the border links of the region more
heavily used as compared to other links. Adaptive
routing is one solution that can reduce the problem of
local congestions. Normally, the term adaptive refers to
a possibility to sense congestion and take action to
divert from it. In this sense it is reactive. When regions
are used in a NoC it is possible that this information is
incorporated in the routing algorithm so that
occurrence of congestion is reduced or avoided.

2.2. Accessing and Addressing Regions

Since a region occupies a larger area than a
standard resource, it may be useful to consider several
addresses and several access points to it. A large region
may internally provide different types of access
mechanisms to its internal resources. The purpose for
which the region is used might also have an effect on
how the region is designed. A large shared memory
perhaps requires several access points distributed
around the entire border, whereas a system with many
processing elements might be accessed only by a few
resources outside the region. When using a region the
issue of access-points and addresses to the region must
be defined.
The three major options, in order of increased
routing complexity and accessing power are:

1. Use corner router which originally had a resource
connected to it as a single access point
2. Use the routers on the border that originally had
connections to resources as multiple access
points
3. Use all the possible routers on the boundary as
access points.
Fig. 1 illustrate how a region can be accessed using
multiple access points. In this figure, routers on the
region boundary connect through the wrapper to the
internal region core.

3. Deadlock Free Routing in NoC Systems

The deadlock free algorithms developed for
homogenous mesh networks, like Odd-Even routing
algorithm [12], cannot be directly used in NoC with
regions. To be able to reach all destinations the routing
algorithm has to decide about turns to get around the
region. This will in many situations violate rules that
were used to secure deadlock freeness property in the
case of a homogenous NoC. Breaking these rules in
order to reach a destination may result in a deadlock
situation.
In the following subsections we describe two
routing algorithms that we have used in our evaluation
of routing performance of NoC in the presence of
regions. They represent two different approaches that
can be used to guarantee deadlock free routing in a
NoC both with and without regions. Due to the
restrictions of on-chip resources, we present algorithms
that do not require virtual channels. However, it is
possible to include this feature to increase network
performance. The first approach is adopted from the
area of fault tolerant routing. It is a general routing
algorithm in the sense that it works for any traffic
scenario and region placement in a NoC. This results in
good scalablilty and it supports dynamic changes of
both architecture and communication patterns.
The second approach has evolved from knowledge
of the design optimization of embedded systems. It
relies on the assumption that communication among
tasks in an embedded application is known in advance.
This information about the communication is
incorporated when designing the routing algorithm. As
we need not consider all possible communication
patterns, fewer restrictions need to be applied on the
routes of the actual communications to avoid
deadlocks. Thus, an application specific routing
algorithm can have more adaptivity as compared to a
general algorithm. However, any change in
architecture or communication pattern requires a re-
analysis and possibly re-design of the complete routing
algorithm.

3.1. Algorithm from Fault Tolerance Area

Chen and Chiu [9] present a fault tolerant algorithm
that can be used for routing in the presence of regions.
However, the published algorithm had some errors
which have later been corrected. The improved version
of this algorithm has been used in [13] for routing the
presence of regions in a deadlock free manner. We
describe the basic ideas in the original algorithm here,
for a thorough description of the algorithm, see [9]. For
our purpose a faulty block described in the original
algorithm is equivalent to a region.
Chen and Chiu [9] borrow the idea of rings and
chains from [7] to isolate the faulty nodes from the rest
of the network. For messages which do not encounter
any ring or chain, they allow non-adaptive routes
which use maximum one turn from source to
destination. For messages encountering faulty blocks it
becomes necessary to allow some turns which are
forbidden during normal routing. Only a few
combinations of forbidden turns are allowed in a clever
manner such that these turns can never combine with
each other (or with the allowed normal turns) to form a
cycle. When routing on paths not affected by faults,
messages are forwarded in the network according to
their type, as illustrated in Fig. 2.



Fig. 2. Message types and corresponding allowed
routes in algorithm

A message is of type row first (RF) if it has the
destination to its west. If the destination is to its north
or south it is a column first (CF) message. A message
of type RF can thus change to CF when it reaches the
column of destination. If it has its destination to its east
it is of type column first (CF) except when the
destination is in the same row, then it is row only (RO).
A CF can also change to RO if the destination is in the
same row to its east. However, an RO message never
changes its type. If a message hits the border of a
RF RO
CF
CF
CF
CF
faulty block special rules apply depending on the type
of the message and whether the border resides on fault
ring or a fault chain. There are different rules for
routing around these depending on whether faults are
surrounded by; an s-chain (chain that touch the south
border only), a non s-chain (chain that touches only the
west or west and south border) or ring (all other
positions of rings and chains). Fig. 3 illustrates routes
for some messages when traveling in the presence of
faulty blocks (regions). Messages are denoted by their
source (Sn) and destination (Dn).



Fig. 3. Message routes when encountering fault rings
and chains

3.2. Application Specific Routing Algorithms

Typical routing algorithms for NoC systems are
designed for a specific network topology and are
independent from the application which will be
mapped on the NoC. If a small variation of the
topology should occur (e.g., due to the merging of tiles
of a mesh based network to form a region) the routers
need to be redesigned. The use of routing table helps to
overcome this problem and makes the router general
and configurable.
Routing tables are filled up with information, which
enables the communication between every pair of
network nodes. The constraint to be satisfied is that the
channel dependency graph (CDG) [10] should not
contain any cycle to ensure that the routing is deadlock
free. To do this, some possible paths, that allow two
nodes to communicate, must be prohibited causing a
degradation of routing adaptiveness. This is, however,
a strong limitation in an embedded system scenario and
the designer cannot exploit his knowledge of the
application that will be mapped on the NoC.
Often the designer is aware about which core pairs
that communicate, and which do not. To overcome this
limitation a methodology to generate application
specific routing functions has been proposed in [11].
The basic idea of this methodology, known as APSRA
(APplication Specific Routing Algorithm), is to extend
Duato’s theory in such a way as to exploit the
designer's knowledge about communication
characteristics of the application being implemented.
As a result an application specific channel dependency
graph (ASCDG) is built incorporating knowledge
about the communication topology of the algorithm.

Application
to be mapped
Application
to be mapped
T1
T4
T3
T2
Tn
Communication Graph
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
P12
P13
Network Topology
Mapping
Function
Mapping
Function
APSRA
APSRA
C1
C2

Cm
Comm.Concurrency
Routing
Tables
Compression
Compression
Compressed
Routing
Tables
Memory
budget
Memory
budget
Fig. 4. Overview of APSRA design methodology

In [11] it is proved that if the ASCDG is acyclic
then the routing is deadlock free. Since the ASCDG is
a sub-graph of the CDG, it has more probability to be
acyclic. This probability is quite high since, in practical
cases, each node of the network communicates with a
small subset of other nodes. The result is that a number
of dependencies that are present in the CDG (which is
built by conservatively assuming that all the network
node pairs will communicate) are not present in the
ASCDG (which is built by assuming the actual
communicating pairs). However, if the ASCDG is not
acyclic, a heuristic to break all the cycles with the
objective to minimise the impact on the degree of
adaptiveness, and with the constraint to guarantee
destination reachability has been proposed in [11].
Fig. 4 shows an overview of the APSRA design
flow. The starting point is the application being
implemented along with the network topology. The
application is divided into a graph of concurrent tasks
and, using a set of available IPs, the application tasks
are assigned and scheduled. Finally, a mapping
function is used to decide to which node of the network
each selected IP should be mapped on.
Using this information APSRA generates a set of
routing tables (one for each router of the NoC), which
guarantee both reachability and deadlock freeness with
the objective to maximise the degree of adaptiveness.
The information about communication concurrency
could be also exploited to improve the adaptiveness.
S1
D1
S2
D2
S3
D3
S-Chai
n
Faulty Nodes
Active Nodes
Route
Fault-Ring
Non S-Chain
Finally, a compression technique can be used to
compress the generated routing tables [14].
For the sake of example, let us consider the
communication graph and the topology graph depicted
in Fig. 5(a) and 5(b) respectively.
T6
T3
T1
T5
T4
T2
P1
P2
P4
P5
l
12
l
21
l
45
l
54
l
41
l
14
l
52
P3
P6
l
23
l
32
l
56
l
65
l
25
l
63
l
36
l
12
l
21
l
45
l
54
l
41
l
14
l
52
l
23
l
32
l
56
l
25
l
63
l
36
l
65
l
12
l
21
l
45
l
54
l
41
l
14
l
52
l
23
l
32
l
56
l
25
l
63
l
36
l
65
l
12
l
21
l
45
l
54
l
41
l
14
l
52
l
25
Communication Graph Topology Graph
(a) (b)
(c) (d) (e)
T1↔T5
T2↔T4

Fig. 5. Comparison of cyclic dependencies without and
with APSRA methodology

Although for this example the topology is mesh-
based, the approach is general and can be applied to
any network topology without modification. As
mapping function, let us consider M(T
i
) = P
i
,
i=1,2,3,4,5.
The CDG for a minimal fully adaptive routing
algorithm is shown in Fig. 5(c). Since it contains six
cycles, Duato's theorem cannot assure the deadlock
freeness of the minimal fully adaptive routing for this
topology. The number of cycles is reduced to two for
the ASCDG as shown in Fig. 5(d). Although also in
this case we cannot assure the deadlock freeness, we
can simply break the cycle as follows. The application
specific channel dependency l
4,1
→l
1,2
is due to the
communication T4→T2. Such communication can be
realized by both paths P4→P5→P2 and P4→P1→P2.
If the routing function is restricted in such a way as
the latter path is prohibited, then the application
specific channel dependency l
4,3
→l
3,1
does not exist
any longer. In a similar way it is possible to break the
second cycle, removing, for instance, the dependency
l
1,4
→l
1,5
due to the communication T1→T5. However,
this restriction reduces the degree of adaptiveness of
the routing. Now suppose that we have some
knowledge about communication concurrency and
suppose that communication T1→T5 and
communication T2→T4 do not overlap in time.
Fig. 5(e) highlights the dependencies due to such
communications. Since these communications are not
concurrent, the associated dependencies are not
concurrently active too. The result is that the two
cycles are actually false cycles. In conclusion, for this
latter case a minimal fully adaptive routing is deadlock
free.
4. Evaluation of Algorithms

4.1. Adaptivity Analysis

One metric to characterize an adaptive routing
algorithm is the degree of adaptiveness [5]. For a given
source destination pair the degree of adaptiveness is
defined as the ratio between the number of admissible
paths and the total number of paths connecting the
source node to the destination node. We calculated the
adaptiveness for a 7x7 NoC with a 2x2 region placed
in the center of the NoC with 4 access points and 1
access point and at bottom left corner with 3 access
points and 1 access point. Note that Chen and Chiu’s
algorithm actually is non-adaptive, and that the
reported adaptiveness is for comparison purposes only.
In all these configurations, the average degree of
adaptiveness exhibited by APSRA exceeded 80%,
while the values of Chen and Chiu’s algorithm in all
cases were slightly below 40%. To compare the
algorithms for different region sizes, we define a new
adaptivity measure called relative adaptivity. It
represents the ratio between the number of paths when
region is present and the number of paths without
region.
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0,45
0,5
1x1 2x1 2x2 3x2 3x3
APSRA
Chiu

(a)
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1x1 2x2 3x2 3x3 4x3 4x4
APSRA
Chiu

(b)
Fig. 6. Relative adaptiveness vs. size of region: (a)
region in centre and (b) region in bottom left corner

Fig. 6(a) shows the relative adaptivity for different
region size located at the center of the NoC, whereas
Fig. 6(b) shows this variation for regions located at the
bottom left corner of the NoC. For both cases and for
each region the access point is located at the top right
corner. As expected, the relative adaptivity decreases
with the increase in size of the region in general. For
regions located at the corner of the NoC there is a
minimum in relative adaptivity when region size is 3x3
(or half the dimension of mesh NoC). If region size
increases further the relative adaptivity increases. This
effect is caused by the fact that a region located at the
bottom left corner of the NoC obstructs only
communications between nodes located at the north
quadrant and east quadrant of the region. The number
of these nodes is equal for regions 3x3, 4x3, and 4x4.
For this reason, whilst the number of paths without
region decrease on average (because access point
moves in direction of the center of the NoC), the
number of paths remains fairly the same when region
size increases from 3x3 to 4x3 and further to 4x4.

4.2. Simulation Based Evaluation

For our evaluation purposes we have developed a
model of 7x7 mesh topology NoC with regions in SDL
(Specification and Description Language). We have
implemented wormhole switching with a packet size of
10 flits. Every router has two flit input and one flit
output buffer. The router can simultaneously route
packets destined to non-conflicting output ports. The
minimal link delay is three cycles / flit and the
maximum link bandwidth is 0,5 flits / cycle (1 packet /
20 cycles). Cores are modeled as traffic generators and
resource network interface has output buffer large
enough to keep packet generation un-affected by
network conditions. The flits in a packet are sent in a
burst mode at the maximum link bandwidth and the
gap between the packets is varied according to a
Poisson distribution. The destinations for generated
packets are randomly selected with hot-spot probability
of 60 % for region access points. We compare APSRA
and Chen and Chiu’s algorithm with region of size
2x2, either in bottom left corner with 3 access points
(bl_ap3) or in centre of network with 4 access points
(c_ap4). Simulations were carried out using Telelogic
SDL simulation tool (Tau 4.4).
The following parameters were used to study the
performance of a NoC platform. Performance values
were collected over 60 000 packets, after a warm-up
session of 30 000 packets.

• Average Latency: The average delay of a
packet from source (when the header leaves) to
the destination (when the tail has reached).
• Blocked Routing Cycles/Router: The total
number of routing cycles when packets were
blocked in a router.

Latency values were averaged over 5 random
traffic scenarios to get an overall view about how the
performance in the network is affected by changes in
network configuration and packet injection rate.
Blocked Routing Cycles/Router can give information
where the network is most congested.

Simulation Results
We can classify communication traffic into three
types, namely, as communication traffic to region, as
other traffic where a resource other than the region is
a destination, and as all communications which is the
aggregate of the first two types of traffic.
Average Latency, All Communications
33
38
43
48
53
1 2 3 4 5 6 7 8
Packet Injection Rate (% of LBW)
Latency (cycles)
apsra_bl_ap3
chiu_bl_ap3
apsra_c_ap4
chiu_c_ap4

Fig. 7. Average latency for all communications
with region placed in bottom left (bl) and centre (c), vs.
packet injection rate in % of link bandwidth

The first result shows average latency for all
communications in the network, as depicted in Fig. 7.
The lowest latency values are obtained for APSRA
with central region (apsra_c_ap4). Second lowest
latency values are obtained with Chen and Chiu’s
algorithm and central region (chiu_c_ap4). After this is
APSRA with region in bottom left corner
(apsra_bl_ap).
The worst performance is shown by Chen and
Chiu’s algorithm and region in bottom left corner
(chiu_bl_ap3). In Fig. 8 we give average latency for
traffic with destinations other than the region. The
worst position from latency point of view, up to an
injection rate of 5%, is with Chen and Chiu’s algorithm
and region in centre (chiu_c_ap4). In this case all the
other combinations provide similar latency values in
this range. However, when injection rate is increased
above 5%, Chen and Chiu’s algorithm and region in
corner position (chiu_bl_ap3) rapidly saturates. Next to
saturate is APSRA with region in corner
(apsra_bl_ap3). The best result from saturation point of
view is when using APSRA and region in centre
(apsra_c_ap4), although it has slightly higher latency at
lower injection rates.
Avera
g
e Latenc
y
, Other Traffic
35
36
37
38
39
40
41
42
43
1 2 3 4 5 6 7
Packet Injection Rate (% of LBW)
Latency (cycles)
apsra_bl_ap3
chiu_bl_ap3
apsra_c_ap4
chiu_c_ap4
Fig. 8. Average latency for communications destined
outside region, with region in bottom left (bl) and
centre (c), vs injection rate in % of link bandwidth
Average Latency, Region Traffic
33
38
43
48
53
58
1 2 3 4 5 6 7 8
Output Rate (% of LBW)
Latency (cycles)
apsra_bl_ap3
chiu_bl_ap3
apsra_c_ap4
chiu_c_ap4
Fig. 9. Average latency for communications destined
to region in bottom left (bl) and centre (c), vs injection
rate in % of link bandwidth

We also give results for traffic destined only to
region (see Fig. 9). In this case also APSRA with
central region show the best performance results in
terms of low latency. In this case, however Chen and
Chiu’s algorithm with central region clearly gives
better results than both algorithms with region at
bottom left position. Worst performance is also in this
case shown by Chen and Chiu’s algorithm with region
in bottom left corner.
Fig. 10 gives more detail about what causes the
difference in latency values. The diagrams present
values on how many routing cycles the packets were
blocked in different routers. These results are from
one of the simulations with 10 % packet injection rate,
where the difference in latency was very large. Note
that the scale of blocked routing cycles is not the same
in the two diagrams.

(a)

(b)

Fig. 10. Blocked routing cycles/router with (a)
APSRA algorithm and (b) Chen and Chiu’s algorithm

Fig. 10 (a and b) reveals that APSRA algorithm
does not cause as much blockage as does Chen and
Chiu’ algorithm. Note that Chen and Chiu’s algorithm
result in more blockages close to north and west border
of the region. The reason is that this path is highly
utilized by the algorithm in the procedures of routing
around region border. APSRA on the other hand is not
biased towards specific routes, and thus spreads the
traffic more evenly around the border. As APSRA in
many situations have several paths to select from it is
also possible to avoid congested routes which further
decreases the blockage.

Discussion on Results
The simulation results show that APSRA has an
overall advantage in communication latency, for
identical traffic scenarios. This is probably an effect of
its unbiased behavior, which has fewer tendencies to
create highly congested routes as compared to Chui’s
algorithm. In addition, the higher adaptivity of the
algorithm makes it possible to avoid congested routes.
This is especially shown in the results of the traffic not
destined to the region. In this case, a large difference is
shown between APSRA and Chen and Chiu’s
algorithm for the region in the centre.
Even though the average distance for APSRA is
slightly longer for a region in the centre, as indicated
by a somewhat higher latency at lower loads, APSRA
manages to keep communication below saturation up
to approximately 8%. For the same scenario, Chen and
Chiu’s algorithm has significantly higher latency.
Considering traffic to region, the latency is more
dominated by the distance from sources to the
destinations, which in this case is shorter with a
centrally placed region. As traffic to the region has a
probability of 60% this also dominates the average
latency when we consider “all communications” case.

5. Conclusions

In this paper we have highlighted the importance of
the region concept in mesh topology NoC architecture.
We have also listed new issues which a designer will
encounter while designing a heterogeneous mesh
topology NoC system using multi-port or multi-access
point cores. We presented and compared two deadlock
free routing algorithms for mesh NoC with regions.
Our analysis and simulation based evaluation
demonstrate that minimal distance deadlock free
algorithms designed using APSRA methodology out-
performs the other algorithm borrowed from fault
tolerant area in terms of adaptivity and latency.
However, the area of a NoC router required by the
APSRA based algorithm is expected to be larger than
the router for the other algorithm. This is because
APSRA requires tables within each router to store
routing information, whereas the other algorithm can
be implemented as an optimized FSM. The table based
implementation of the APSRA based algorithms could
also be a blessing because it allows configurability
(and even dynamic re-configurability) of routing
algorithms to efficiently handle modifications in
communication requirements in the running
applications. Future developments will mainly address
the definition of design space exploration strategies to
optimally determine region placement, shape, and
number of access points.
Acknowledgements
The research reported in this paper was supported by the
project, “Specialization and Evaluation of Network on Chip
Architectures for multi-media applications”, funded by the
Swedish K.K. Foundation. We thank Prof. Petru Eles for
valuable discussions and suggestions.

6. References

[1] Kumar, S., Jantsch, A., Soininen, J-P., Forsell, M.,
Millberg, M., Öberg, J., Tiensyrjä, K., Hemani, A.: A
network on chip architecture and design methodology.
In IEEE Annual Symposium on VLSI (April 2002)
[2] Dally, W.J., Towles, B.: Route Packets, Not Wires: On-
Chip Interconnection Networks. Design Automation
Conference (DAC), Las Vegas, NV (June 2001)
[3] Bolotin, E., Morgenshtein, A., Cidon, I., Ginosar, R.,
Kolodny, A.: Automatic Hardware-Efficient SoC
Integration by QoS Network on Chip. ICECS (2004)
[4] Pande, P.P., Grecu, C., Ivanov, A., Saleh, R.: Design of
a Switch for Network on Chip Applications, Proc. Int.
Symp. Circuits and Systems (ISCAS), vol. 5, pp. 217-
220, May 2003.
[5] Glass, C. J., Ni, L. M.: The turn model for adaptive
routing, Journal of the Association for Computing
Machinery, vol. 41, no. 5, pp. 874-902, 1994.
[6] Dally, W.J., Aoki, H.: Deadlock-free adaptive routing in
multicomputer networks using virtual channels. IEEE
Transactions on Parallel and Distributed Systems,
4(4):466--475, (April 1993)
[7] Boppana, R. V., Chalasani, S.: Fault-tolerant wormhole
routing algorithms for mesh networks. IEEE
Transactions on Computer, Vol. 44, No. 7, (1995)
[8] Wu, J.: A Fault-Tolerant and Deadlock-Free Routing
Protocol in 2D Meshes Based on Odd-Even Turn
Model. IEEE Trans. Computers 52(9):1154-1169 (2003)
[9] Chen, K-H., Chiu, G-M.: Fault-Tolerant Routing
Algorithm for Meshes without Using Virtual Channels.
Journal of Information Science and Engineering, Vol.14
No.4, pp.765-783 (December 1998).
[10] Duato, J.: A New Theory of Deadlock-Free Adaptive
Routing in Wormhole Networks. IEEE Trans. on
Parallel and Distributed Systems, 4(12): 1320-1331
(December 2003).
[11] Palesi, M., Holsmark, R., Kumar, S., Catania, V.:
APSRA: A methodology for design of application
specific routing algorithms for NoC systems. Technical
Report DIIT-TR-01-060406, Dip. di Ingegneria
Informatica e delle Telecomunicazioni, Univ. di Catania
(2006)
[12] Chiu, G.-M.: The Odd-Even Turn Model for Adaptive
Routing, IEEE Trans. on Parallel Distribuited Systems,
vol. 11, no. 7, pp. 729-738, 2000.
[13] R. Holsmark and S. Kumar, “Design Issues and
Performance Evaluation of Mesh NoC with Regions”,
Norchip 2005, Oulu, Finland (November 2005)
[14] M.Palesi, S.Kumar, R.Holsmark, “A Method for Router
Table Compression for Application Specific Routing in
Mesh Topology NoC Architectures”, SAMOS VI:
Embedded Computer Systems: Architectures, Modeling,
and Simulation. Samos, Greece, July 17-20, 2006.