An Evaluation of GOAL and GAL Routing
Algorithms: Aimed at Future Research
Kansas State University
Department of Electrical and Computer Engineering
February 25, 2007
Efficient routing algorithms are now becoming more and mo
re necessary for productive
Massively Parallel Processing (MPP) Computers, such as the Red Storm Supercomputer
located at Sandia National Laboratories, NM. Some current implementations of MPP
computers have used source
based routing, which during nodal fa
ilures, requires the
entire network to halt and recompute routing tables. It should be considered that
dynamic routing algorithms such as GOAL and GAL have performance characteristics
(throughput and latency), which rival source
based routing, but they la
ck data on how
well they perform when nodal failures exist. The work stemming from this research will
aim at evaluating GOAL and GAL routing algorithm performance in terms of nodal
failures. This paper serves as a basis for that research.
Not too long ago, focus for increasing performance within computing clusters was
placed on the processing speed. Now the focus has shifted into message passing and
routing algorithms that can keep up with the processing speeds of these large clusters.
The Cray Red Storm MPP (Massively Parallel Processing) computer at Sandia National
Laboratories, for example, uses source
based routing , which looks up all routes on a
precomputed table . While this static routing implementation may work well unde
perfect conditions, the problem arises when a node or link fails. Large clusters such as
Red Storm with its 12,960 processing nodes  then must recalculate routes to and from
each node. This limits its availability and also hinders the productivity.
Focus for this paper will include introducing two potential candidates for a
dynamic routing implementation for large computing clusters such as Red Storm.
Hopefully with work obtained from this paper and in future work, it will be possible to
implementation of routing which will prove to be stable, reliable, and more
productive than the source
based routing which is currently being implemented.
The rest of the paper will continue as follows: first, there is a discussion some of
the previous wo
rk in routing algorithms for the toroid
mesh networks in section II, then
move on to the GOAL (Globally Oblivious Adaptive Locally) routing algorithm in
section III. From there the paper will introduce the GAL (Globally Adaptive Load
Balance) routing algo
rithm in section IV, and finalize with an outline for future work and
conclusions in section V. References are listed in section IV.
The previous work that this paper will focus on the work of  and . These
two papers sum up a lot
of the work that has proven helpful within the realm of routing in
tori and mesh networks. This work seems particularly attractive to those currently
implementing static routing algorithms but may need more tangible data in order to
stimulate a change.
A lot of focus for these two routing algorithms has been to evaluate their
performance in terms of throughput and latency. I propose that this is not enough of a
benchmark when considering large MPP architectures. Nodal failures seem to be an area
previous work has not considered.
Before the introduction of the GOAL and GAL routing algorithms, CHAOS and
Minimal Adaptive routing algorithms were considered the best for dynamically routing
packets through a network . Minimal Adaptive routing occu
rs by choosing the
minimum routes from source to destination using only information about the network
(such as output queues), making decisions at each hop . It does not use any global
information to assess global congestion.
The Chaos routing algorit
hm uses an idea referred to as a deflection routing
scheme, by randomly granting contending packets access to a channel. Packets who lose
the allocation to the channel are misrouted to a free output port, which may or may not be
ng to the work of  and , the GOAL and GAL routing algorithms
outperform CHAOS and Minimal Adaptive. Because of this assumption, this paper will
mainly focus on the GOAL and GAL routing algorithms.
The GOAL routing algorithm, Globally Oblivio
us Adaptive Locally, was
introduced in  and has been the basis for improving routing algorithms on networks
with high bisection bandwidth and high path diversity .
GOAL works by routing a packet from source
, …, s
} to destination
, …, d
} by obliviously choosing the direction to travel in each dimension
to as exactly balance channel load , sending the packet into the direction which is still
productive, but has the shortest queue. It differs from Minimal Adaptive
algorithm by assigning particular quadrants of a network a computable probability based
parameter (determined by the size of the network) as well as the distance
required in the
direction from source to destination .
if there needs to be a packet sent from source node
= (1,2) to
= (5,5) then there are multiple shortest paths. Depending on queue length at
each intermediate node, a decision is made to route the packet onto a particular interface.
ure 1. A section of a 16
cube network graph
from source (1,2) to destination (5,5). Adapted from .
At the beginning of the path, at source
, it examines the channel queues for each
of its interfaces that would propagate the packet in the co
rrect direction. This is
designated as a (+1,+1), which means it needs to be routed in a positive
direction, and a
direction (if considering the graph resides in a standard x,y plane). This is also
known as the (+1,+1) quadrant. After the no
de calculates which direction(s) it is able to
route in, keeping in mind the minimum path, it routes it only based upon its queue length.
If at source (1,2) the (+1, 0) queue (indicating the
direction queue) is shorter
than the (0,+1) queue (indicatin
direction queue), then the next intermediate node
will be (2,2).
By using 3 virtual channels per unidirectional physical channel in a similar
configuration as the *
channels algorithm , GOAL is touted as being deadlock free.
As explained in [1
], there exist two types of channels * and non*. The packets will move
through the *
channels only when progressing in the most productive direction. The non
* channels are usable at any time and consist of *a and *b channels. The *a channel is
the packet has not wrapped around, and *b is used if the packet has indeed
wrapped around the network (considering a fully connected torus network). Because of
this implementation, the channels are acyclic and are proven to be deadlock free. For the
ific proof, see .
In terms of performance evaluation, the authors of  have compared the GOAL
algorithm against other well
known algorithms. These are: Valiant’s Algorithm, which
routes packets to a random node
anywhere in the network (phase 1)
and then to the
destination (phase 2) ; Dimension Order Routing which routes minimally in the x
dimension first, then the y
dimension, then any further dimensions similarly ; Two
Phase ROMM which routes to a random node
in the minimal quadrant, the
n to the
destination ; Randomized Local Balance which chooses a quadrant
to route to in
according to a weighted probability distribution, then route within
first to a random
intermediate node, then to the destination, randomizing the order of matchi
Chaos Routing Algorithm as explained earlier; and finally the Minimal Adaptive
routing algorithm which always routes in the minimal quadrant, routing adaptively within
it , .
Using those other algorithms as benchmarks for perfo
rmance of the GOAL
algorithm,  evaluates their performance against adversarial traffic patterns. These
include Nearest Neighbor, Uniform Random, Bit Compliment, Transpose, Tornado, and
Case traffic patterns .
After assessing the traffic pat
terns and ranking GOAL against the other routing
algorithms, the authors concluded that GOAL indeed achieves high
throughput on the
adversarial traffic patterns. It met or exceeded the throughput of the other routing
Unlike the pr
evious discussion, the Globally Adaptive Load
routing algorithm uses global information to make routing decisions . It senses global
congestion using segmented injection queues. By monitoring traffic within quadrants of
the network, GAL
first routes minimally while congestion is low, but when congestion
increases, routes non
minimally within quadrants with lower congestion.
As shown in Figure 2, the network can be broken up into multiple quadrants, four
in this case (for a 2
graph). The minimal quadrant is quadrant I, meaning
that the minimal path(s) exist within this quadrant. But if congestion occurs and is noted
globally within this quadrant, then routing will actually take place non
quadrant II, III, or IV
Figure 2. Quadrants in a k
ary 2 cube
Torus network. Adapted from .
The quadrant to route in is chosen using two computations. First, it is evaluated
which quadrant is the minimal quadrant (in the case of Figure 2, this would be quadrant
I), and then, it measures whether or not that specific quadrant has an occupancy of traffic
below some threshold,
. If said quadrant is above the threshold, it is discarded as a
potential routing quadrant, and so the source chooses to route within an al
quadrant. From that point on, routing is done minimally within the quadrant selected.
This implementation also requires three virtual channels per unidirectional
physical channel, just like GOAL. By using the same *
channel configuration as
the proof is the same as proposed earlier in this paper, available in .
In terms of performance,  introduces a report card of how well throughput
factored into each of the cutting
edge routing algorithms. Table 1 below shows such
Table 1. Report Card of F
our Adaptive Routing Algorithms.
’ indicates throughput. Borrowed from .
Based upon the table and the authors’ evaluation of GAL’s performance, this
routing algorithm seems to perform very well under adversarial traffic patterns .
Because of th
e need for low latency and high throughput, it’s noted that GAL seems to be
a great competitor for networks that require these considerations.
Future Work and Conclusions
The work from  and  has opened up many doors for research and
in the area of adaptive routing algorithms. In terms of national laboratories
such as Sandia and Lawrence Livermore (among others), this type of research proves
appealing because of the algorithms’ usability on tori and mesh networks. Considering
orm’s last known routing implementation of source
based routing , substantial
data is required in order to persuade a change in routing techniques to a more dynamic
and more reliable alternative.
The work stemming from this paper will include catering
more to the needs of
Sandia National Laboratories and Red Storm MPP (Massively Parallel Processing)
supercomputer by taking the information proposed by  and , and adding data
obtained from GOAL’s and GAL’s performance in terms of nodal failures. Bec
the large number of processing nodes, this is a real issue within large MPP computers.
Nodal failure probabilities may be low, but when considering over 10,000 processing
nodes, the failure of one or more nodes can really cripple computability whe
based routing is used. The alternative, a dynamic and more reliable routing algorithm
may exist within or stemming from the GOAL and GAL routing algorithms. Future work
from this will expand on this thought.
In order to test the nodal failures
in terms of GOAL and GAL, the first thing to
consider would be simulating such an environment. By using simulations, the goal
would focus on benchmarking the GOAL and GAL routing algorithms compared to
Chaos, Minimal Adaptive, as well as the Source
routing in terms of availability,
throughput, and latency when random nodal failures exist.
 A. Singh, W. Dally, A. Gupta, B. Towles.
GOAL: A Load
Routing Algorithm for Torus Networks
. Proceedings of the 30
ASYmposium of Computer Architecture, IEEE, 2003.
 A. Singh, W. Dally, B. Towles, A. Gupta.
Globally Adaptive Load
Routing on Tori
. IEEE Computer Society, 2004.
 Sandia National Laboratories, Computation, Computer
s, Information and
Retrieved: 22 Feb, 2007. Last modified: 16 Feb, 2007.
 W. Dally, B. Towles.
Principles and Practices of Interconnection Networks
Chapter 11, Routi
ng Mechanics. © 2004.
 W. Camp.
Petascale Computing Architectural Requirements.
Laboratories, Computation, Computers, Information and Mathematics. SOS
Workshop, Maui, Hawaii. March 2006.
 W. Dally, B. Towles.
es and Practices of Interconnection Networks
Chapter 10, Adaptive Routing. © 2004.