Pagination for On-line Diagnosis and Reconfiguration of FPGAs

stingymilitaryΗλεκτρονική - Συσκευές

27 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

92 εμφανίσεις

Pagination for On
-
line Diagnosis and Reconfiguration of FPGAs



William Fornaciari
1
, Vincenzo Piuri
2
, Nello Scarabottolo
2

1

Department of Electronics and Information, Politecnico di Milano, piazza L. da Vinci 32, 20133 Milano,
Italy, phone +39
-
2
-
2399
-
3504,

fax +39
-
2
-
2399
-
3411, email fornacia
@
elet.polimi.it

2
Department of Information Technologies, University of Milan, via Bramante 65, 26013 Crema (CR), Italy,
phone +39
-
0373
-
898
-
{242,245}, fax +39
-
0373
-
898
-
253, email {piuri, scarabottolo}@dti.unimi.it



Abst
ract

Design of fault
-
tolerant FPGA
-
based architectures has
increasing interest for mission
-
critical applications. A
novel approach is proposed to introduce concurrent
detection, diagnosis and reconfiguration in the early
design steps. The adopted technique

aims to reduce the
circuit complexity for redundancy and simplify the
reconfiguration, as well as to increase the survival of the
system possibly by accepting performance degradation.

1. Introduction

FPGA technology
is

nowadays widely used to
implement co
mplex systems also in mission
-

and life
-
critical applications, when the environment is harsh, the
maintenance becomes difficult, expensive or even
impossible, or the continuous operation is a mandatory
requirement of the application. Several examples are
a
vailable in space satellites, automotive, avionics, plant
control, and telecommunication.

To introduce fault tolerance in systems implemented
by using memory
-
based FPGAs, up to now three basic
approaches have been considered in the literature.

The first a
pproach (
architectural approach
) to fault
tolerant FPGA systems tackles the concurrent error
detection and the subsequent reconfiguration by
considering the FPGA at a low abstraction level from the
point of view of the uncommitted physical structure as it
appears in the VLSI device. In this case, the FPGA is
considered as a matrix of basic processing elements (the
CLBs of the device, including uncommitted logic
functions and possible memory elements), linked by
switched interconnecting busses. The specific
characteristics of the application system mapped onto the
FPGA are not taken into account. Conventional
techniques [1,2] are used for detection (e.g., physical
modular redundancy based on executing the computation
of each CLB in parallel and comparing the
results [3],
time redundancy based on repeating the computation on
different CLBs and comparing the results [4], data
coding). Comparators are distributed through the FPGA
for on
-
line verification. When an error is detected, the
faulty FPGA block is identi
fied from the analysis of the
active error signals. The faulty block is then removed
from the active computation and replaced by spare
elements to preserve the nominal detection ability as long
as possible. To achieve such a goal, the interconnection
struc
ture is properly reconfigured to bypass the faulty
block by using one of the many reconfiguration
techniques presented in the literature since a long time ago
for arrays of processors. Interconnections and
configuration memory are considered fault free.

Th
e main drawback of this approach with modular
redundancy or data coding consists of the high circuit
complexity due to the hardware redundancy needed to
support detection. Circuits must be duplicated in modular
redundancy, while extra circuits are needed f
or data
coding. Besides checkers must be introduced for each
FPGA basic block. This redundancy (especially for
checkers) is very large since the circuit complexity of the
protected processing element is very small and,
consequently, the percent incidence o
f checking becomes
relevant due to the small granularity of checking. A
similar case occurs in time when time redundancy is
considered. As reconfiguration is concerned, we do not
usually need to introduce large circuit complexity in the
circuit since we ex
ploit the routing features natively
available in the FPGA devices. This is feasible only if
sufficient unused interconnections are available after the
nominal application circuit has been mapped onto the
FPGA device. To perform reconfiguration we may in fa
ct
incur in a lack of sufficient redundant paths so that we
might not be able to complete the new configuration and
rerouting to bypass the fault. This problem may be
relevant since


in the presence of a small granularity of
checking


the spare interconn
ections to be used for
reconfiguration might not be available in the routing
channels since they may be used for the other nominal
links. Besides, to implement fast reconfiguration and to
avoid (or at least limit) additional circuit redundancy for
reconfig
uration the rerouting policy should be simple
enough or predefined. If reconfiguration is driven by the
error signals generated by the checkers and configuration
computed on board, additional circuit complexity must be
considered since the rerouting strate
gy must be embedded
with the nominal circuit. If reconfiguration is decided
outside the FPGA, the error signals coming from the
individual blocks must be carried out of the FPGA for
analysis. In such a case, either the FPGA configuration
can be generated d
ynamically by applying the
reconfiguration rules or pre
-
computed, stored, and then
directly applied (thus requiring a large amount of memory
to store all desired configurations). In both cases, some
additional circuit complexity and routing ability are
req
uired. Another drawback is the fact that
interconnections should not be considered fault free,
especially when they contain several switches to route
signals. Finally, none of the available solutions considered
the protection of the configuration memory in

harsh
environments and with mission critical applications.

The

second approach (
application approach
) to
introduce fault tolerance in FPGAs adopts a high
-
level
view of the system. In this approach the desired
application system is dealt with independently

from the
fact that it is implemented on an FPGA (e.g., [5]). The
fault detection techniques are introduced at application
(system) level by using, e.g., modular redundancy, time
redundancy, data coding, time
-
shared modular
redundancy. Intermediate or fina
l application outputs are
analyzed for error checking, according to the adopted
detection technique. There is no relationship between the
location of the checking and the array structure of the
underlying FPGA device. Whenever an error is detected,
the fau
lty CLB in the FPGA must be identified.
Appropriate diagnostic techniques must therefore be
introduced to locate the faulty processing element (e.g.,
[6
-
16]). Diagnosis can be performed by considering the
physical FPGA, without any concern to the applicati
on
circuit mapped on the FPGA device. Then reconfiguration
can be accomplished by excluding such a processing
element from the active computation through
interconnection reconfiguration and spare activation.

In this approach circuit complexity and perform
ance
are reduced only by the system level use of the checking
technique. Usually this leads to a much lower redundancy
than the previous case. However, since individual CLBs
are not checked on
-
line, subsequent diagnosis for fault
location is needed. This r
esults in a longer repair time
since appropriate testing must be executed. Since system
checking is performed on
-
line, it is likely that also
diagnosis is performed on board: in this case, we need to
introduce additional circuits. This redundancy can easil
y
become quite large with respect to the protected circuits.
Subsequent reconfiguration may incur in lack of sufficient
routing paths to bypass the faulty processing element,
although this case is less likely than in the low
-
level
approach due to possible
complex rerouting.

The third approach (
physical approach
) augments the
physical structure of the FPGA device to directly support
fault detection and confinement [17
-
23]. Basic CLBs and
interconnections are enhanced and made more robust so
that internal ch
ecking can perform detection in each CLB.

This is highly effective and efficient, but requires a
large circuit complexity. Besides and much more
important, the structure of the existing FPGA devices
must be modified to produce a fault tolerant version, th
us
increasing the costs. This solution is therefore not much
desirable when COTS architectures are envisioned, even
for mission critical applications.

Several other researches published in the literature
[19,20,24
-
32] focus on reconfiguration to achieve
ef
ficient reorganization of the application system mapping
onto the FPGA. These techniques are useful for this
second phase, but do not consider the realization of a
comprehensive strategy for on
-
line fault tolerance.

2. The functional partition approach

Th
is paper introduces an innovative approach (called
functional partition approach
) that aims to limit the
circuit complexity required by concurrent detection and
reconfiguration, while providing high detection ability, in
systems described by sequencing gra
phs [33] mapped on
memory
-
based FPGAs (e.g., the Xilinx 6000 Family).

This approach takes into account at the same time both
the array structure of the FPGA in order to regularize the
detection framework and the reconfiguration
management, and the functio
nal operation of the
application circuit to be mapped on the FPGA device in
order to minimize the checking points to be inserted. The
approach is essentially based on the optimal use of the
desired detection technique at a high abstraction level and
the fu
nctional partitioning of the resulting circuit into
logical circuit partitions that fit into reasonably large
physical partitions of the FPGA device. The coarse
granularity of the partitions and the intelligent (since
application dependent) use of checking

points lead to a
limited overhead in circuit complexity and performance
degradation, while preserving the efficient ability of
reconfiguration. With respect to the architectural
approach, the proposed solution is able to limit the
required redundancy sinc
e larger portion of the device are
viewed as the processing elements to be treated as atomic
in the detection and reconfiguration strategies. With
respect to the application approach, the proposed
technique performs diagnosis concurrently with checking,
th
us limiting the overall repair time; only extraction and
analysis of the few error signals produced by the partitions
are needed.

The basic aims of the proposed solution are on
-
line
detection, low circuit complexity required for checking,
small number of s
pare interconnection links required to
reroute the data needed by reconfiguration, fast fault
localization, and efficient support of reconfiguration.

The approaches to tackle the above issues were
inspired by the concepts of pagination and segmentation
tha
t were introduced to support the FPGA dynamic
configuration, as a technique to implement virtual FPGAs
[34
-
36]. These techniques were originally designed to
provide dynamic reconfiguration to FPGA architectures
while they are operating, so as to map comput
ations that
exceed the size of the FPGA devices, or to support shared
FPGA use to several processes in a multitasking
environment, or to adapt dynamically the functionalities
of the FPGA to the current application needs.

The coarse granularity to be used
for reconfiguration
will limit the number of checking points and the number
of interconnections required by reconfiguration. This
granularity can be achieved by dividing the desired
computation into segments (portions having variable
complexity) and, possi
bly, pages (portions having
identical predefined complexity). On
-
line detection will
verify the correctness of the computation performed by
segments and pages. A network of control signals will
support on
-
line localization of faulty parts. External
dedicat
ed devices will be used to dynamically change the
FPGA configuration to exclude the faulty components.

3. On
-
line detection

To implement on
-
line detection of the sequencing
graphs we separate the datapath from the control path.
Since the control path str
ucture consists of an FSM,
protection is achieved by applying one of the techniques
available in the literature for self
-
checking FSM [37,38].
Duplication with comparison is straightforward, but has a
high redundancy. Lower circuit complexity can be
achiev
ed by using redundant state representations.

Our main focus is on protection of the datapath. To this
aim, we can consider any of the approaches proposed in
the literature, namely, physical modular redundancy, time
redundancy, data coding, and time
-
shared
modular
redundancy (e.g., [1,2,39
-
42]). Both concurrent and semi
-
concurrent solutions can be adopted, according to the
actual need of certifying the correctness of every output.
These techniques are applied without any specific concern
to its mapping onto
the FPGA architecture. Circuit
redundancy, latency increase, and throughput reduction
are those of the adopted approach to on
-
line detection.

To reduce the number of checking points, especially in
dedicated architectures while preserving the desired
detect
ion capacity, some high
-
level design approaches
have been proposed by fully exploiting the detection
ability of modular redundancy or data coding (e.g., [39
-
42]). These solutions analyze the application algorithm
described as a sequencing graph and identif
y the locations
where checking is required to avoid aliasing for the
adopted detection technique. These approaches partition
the application system into functional blocks (called
detectable subgraphs
) that need to be protected only at
their boundaries, whi
le propagation of errors due to faults
occurred in them towards the block outputs is guaranteed.
By placing the error checkers only at the designated
outputs of the detectable subgraphs that are needed to
completely cover the application graph, we minimize

the
overall number of checkers that are needed by concurrent
detection and, in turn, we reduce the circuit complexity
introduced by fault tolerance. Appropriate strategies (as
those reported in [39
-
42]) are envisioned to allocate and
schedule the checking

operations in the sequencing graph
when reuse of functional units is allowed to reduce the
circuit complexity of the application circuit.

Functional units are never reused to map operations
belonging to each detectable subgraph in order to
guarantee the a
bsence of aliasing and, in turn, the error
detection. A functional unit can be reused to map different
operations in disjoint detectable subgraphs without
incurring in aliasing since the possible presence of errors
in the results computed by the subgraphs
is independently
checked and the error due to a fault in such a unit can
affect at most once the computation performed by each
subgraph. Reuse of functional units reduces the overall
circuit complexity.

Two disjoint detectable subgraphs whose operations
ar
e mapped on different groups of functional units (i.e.,
there are not shared units) are linked each other at most by
data flowing between the shared borders. This highly
increases the independency between the mapping
locations of these subgraphs in the FPG
A and,
consequently, favors the opportunities of their relocation
in the FPGA architecture. In fact mapping operations
belonging to disjoint detectable subgraphs on the same
shared functional unit imposes to reroute not only the data
links between the shar
ed borders of the subgraphs, but
also the inputs and the outputs of such a unit; this will
need more FPGA interconnection resources and, thus, will
make routing much more difficult.

On the other hand, requiring no reuse of functional
units increases the o
verall circuit complexity. However,
this is not always a problem: in many computing
-
demanding applications the high performance has
intrinsically to be achieved by massively parallelizing the
computation. If short latency is needed, operations are
mapped o
n different functional units. If high throughput is
required, pipelining is adopted to reduce the clock cycle,
but functional units cannot be reused in different pipeline
stages. In all these cases, the nature of the application
implicitly leads not to reu
se the functional units.
Examples can be found in several real
-
time signal and
image processing applications, data communication
channels, and multimedia data compression.

When reuse is feasible and desired to reduce the circuit
complexity, the designer sh
ould look for the most suitable
tradeoff between these two conflicting issues, according
to application needs and constraints: functional unit reuse
reduces the circuit complexity but increases the FPGA
interconnection complexity, and vice versa.

4. Recon
figuration

In this paper we focus our attention to the datapath.
Fault location and reconfiguration strategies for system
survival are here envisioned for the datapaths of the
sequencing graph that describes the computation.

The group of functional resou
rces needed to perform
the computation of a detectable subgraph is called
logic
segment
. In general these logic segments have a different
circuit complexity since partitioning has been applied at a
functional level by considering only the actual need of
in
troducing checkers to avoid aliasing according to the
adopted detection technique.

At first, let’s consider the case in which the various
logic segments have approximately the same size, i.e.,
require about the same number of CLBs to be
implemented in the
FPGA.

When a logical segment is mapped onto the FPGA, we
identify a specific group of physical FPGA resources that
perform the computation of the segment. To provide fault
tolerance we need to be able to reconfigure the mapping
of segments onto the FPGA i
n order to exclude the FPGA
faulty physical resources from the active computation. To
such a purpose we need an abstract view of the segment
mapping which has to be independent from the actual
FPGA physical resources that will be used to perform the
comput
ation. Specifically we need to define the mapping
of the segment onto a portion of the FPGA in a way that
can be relocated of the FPGA resources. Therefore we
distinguish the group of FPGA physical resources that
actually perform the computation from the g
roup of
FPGA logical resources that are associated to the segment
operations but can be still relocated on the FPGA physical
structure. The group of logical FPGA resources that are
needed to implement a logical segment but are relocatable
onto the physical

FPGA architecture is called
logical tile
.

Without loss of generality, we assume that all logical
tiles have the same rectangular shape, i.e., their CLBs
occupy the same kind of rectangular structure (otherwise,
all segments can be reshaped to the same re
ctangle by
reallocating operations and by suited routing).

The application circuit to be mapped on the FPGA
appears as a collection of rectangular
logical tiles

(Fig. 1).
Each logical tile is


by construction


self
-
checking since
implements a logical se
gment of the application circuit
obtained as described in Section 3.

To introduce the reconfiguration ability, we view the
FPGA device as partitioned in
physical tiles
of adjacent
CLBs, each physical tile having the size and the shape of
the logical tiles
. The FPGA device must contain at least as
many physical tiles as the logical ones if the application
circuit must operate at its own maximum performance.
The physical tile is the atomic block that will be
considered during reconfiguration to exclude the f
aulty
FPGA resource from the active computation.

Mapping the application circuit (including the self
-
checking features) consists of mapping each logical
Mapping and, thus, FPGA configuration are successful
and the application circuit will operate at its o
wn
maximum performance if all logical tiles can be
accommodated on fault
-
free physical tiles and signal
routing among them can be completed by using fault
-
free
interconnections (Fig. 2).

a)

logical tile

CLB



b)

logical tile

logical segment



Fig. 1


The logical view of the FPGA: (a) the partitioned
FPGA, (b) the logical tile containing a logical segment.


If spare logical tiles and spare interconnections are left
unused in the FPGA device after mapping the application
circuit,

they can be used by reconfiguration to tolerate
further faults.

When an error is detected in a logical segment, the
error signal produced by the checker in the logical tile
containing such a segment becomes active. A suited
network can be introduced in t
he logical tiles to propagate
the error signal toward the external circuits that control
the reconfiguration (Fig. 3). Redundancy by duplication
can be adopted to provide self
-
checking features to the
error propagation network. By analyzing the values of t
he
error signals we can deduce which is the physical tile
where the fault occurred by taking into account the
direction of the computation propagation (other error
signals may become active if the error is propagated to
other tiles and is detected also the
re).


e

logical tile
mapped on a
fault
-
free
physical tile

faulty
physical tile

spare

logical tile


Fig. 2


Mapping of logical on physical tiles.




horizontal error
signal input

horizontal error
signal output

vertical error
signal output

vertical error
signal input

tile error signal



Fig. 3



The error signal propagation in a tile.


The external reconfiguration controller applies the
reconfiguration algorithm that identifies
the new mapping
of logical onto physical tiles to exclude the faulty physical
ones, by taking into account the current fault distribution
into the FPGA device, the distribution of spare physical
tiles, and the availability of spare interconnection links.

To identify the new mapping of tiles in reconfiguration
we can adopt any of the algorithms proposed in the
literature [19,20,24
-
32,43] that is compatible with the
distribution of spare physical tiles and unused
interconnection links.

Routing can be accomp
lished by simply adjusting the
interconnections locally or by performing a completely
new routing, according to the position of the spare
physical tiles and the new tile mapping. It is worth
nothing that only the inter
-
tile connections (i.e., the links
bet
ween logical segments) condition the successfulness of
the new mapping and routing. Intra
-
tile connections
among FPGA resources does not influence the tile
mapping and routing since


by construction


they are
available and fault
-
free when a logical tile
is moved on a
spare fault
-
free physical one. Additional opportunities of
successful rerouting could be achieved by exploiting
unused interconnection links within the physical tiles:
however this may highly increase the complexity of the
routing algorithm s
ince it has also to know and deal with
the internal structure of each logical tile.

Reconfiguration is immediately applied by modifying
the FPGA configuration memory. To re
-
map logical tiles
we simply have to move the corresponding portion of the
FPGA conf
iguration memory from the location associated
to the original mapping to the location associated to the
destination one. To adapt the interconnections, the
switches controlling the interconnection links among
physical tiles must be configured according to
the new
routing. When logical tiles are simply shifted sidewise to
empty a faulty physical tile, the new routing can be easily
obtained by shifting the links toward the same direction. If
enough spare interconnection links are available, simple
standard re
direction rules can be applied to generate the
new routing.

It is now clear that reconfiguration becomes very
simple when there are links only between logical tiles, i.e.,
if there is no functional resource sharing between logical
segments (each containing

a disjoint detectable subgraph).
In such a case in fact logical tiles can be relocated without
any concern with their internal configuration and, thus, by
caring only of the interconnections composing the
external interface of the logical tiles. On the co
ntrary, in
the presence of shared functional resources among logical
segments, a shared resource is implemented in only one of
the physical tiles; suited routing is needed to carry signals
among the related physical tiles, thus making the external
interfac
e of each tile more complex and much more
irregular. This, in turn, results in much many difficulties
in reconfiguration, especially as routing is concerned.

5. Remarks on the logical segment size

The above scenario is feasible and effective only if the
si
ze of the logical segments (and, in turn, that of the
logical tiles) is not too large with respect to the size of the
FPGA device. If logical tiles are too large, only few tiles
can be created. When a fault occurs, the whole physical
tile containing the fa
ult is declared faulty and is excluded
from the active computation. This leads to waste a great
number of fault
-
free CLBs. Besides the small number of
spare physical tiles will reduce the opportunities for
successful reconfiguration and, thus, the system s
urvival,
especially in the case of subsequent faults.

On the contrary, the size of logical segments should not
be too small in order to limit the percentage incidence of
the redundant circuits for error detection and dynamic
reconfiguration onto the global

circuit complexity, and to
limit the need of spare links for rerouting.

If segments are too small, we can try to put some
disjoint detectable subgraphs into the same logical
segment and, thus, in the same logical tile. The error
signals generated by the c
heckers associated to the outputs
of each detectable subgraph in a logical tile are merged
together to generate the tile error signal. If the resulting
logical tiles have identical and reasonable size, the case
discussed in Section 4 applies again. The log
ical tile
containing several detectable subgraphs is managed as a
simple tile containing only one detectable subgraph:
during reconfiguration it is re
-
mapped as a whole onto the
newly selected physical tile.

If the logical segments have highly different s
izes, we
can try to merge some of them as discussed above to
create larger logical tiles having quasi
-
homogeneous size.
If the resulting logical tiles have reasonable size, again we
are in the case considered in Section 4.

Let’s consider now the cases in w
hich the logical
segments are too large or have highly different sizes and
merging fails to produce homogeneous logical tiles. In
these cases, we have to split the large segments in order to
allow reconfiguration. Splitting a logical segment implies
that w
e need to introduce additional check points to verify
separately the computation of each portion of the segment.
We need also additional spare links to guarantee routing.
This increases the complexity for fault tolerance.

To perform splitting homogenous
-
s
ize portions that fit
into the logical tiles must be obtained. This can be
accomplished by adopting the pagination approach to
virtual FPGA [34,35]. In this case, the circuit contained in
the logical segment is divided in
logical pages
, all having
the same

size and shape (suitably defined a priori by the
system designer). Circuit partitioning is performed so as
to minimize the number of connections among pages.
Each logical page is mapped into a physical tile of the
FPGA device as discussed for segments in
Section 4.
Error detection, fault location, and reconfiguration are
performed in the same way as for logical segments.

6. Continuous output correctness

Although detection and reconfiguration can be
performed on line by using appropriate circuits, the
outp
uts delivered by the FPGA are not


in general


continuously correct. On
-
line detection certifies only the
correctness, but does not prevent delivery of erroneous
results in the presence of faults. On
-
line reconfiguration
restructures the mapping of logic
al operations onto the
FPGA resources to exclude the faulty resources and to
guarantee the system survival, but does not re
-
compute
automatically the correct results starting from the correct
inputs of the physical tile that became faulty. Besides,
reconfi
guration is not always instantaneous since the new
mapping need to be identified and applied.

After fault occurrence, erroneous results are
propagated toward the outputs. In the worst case, correct
outputs begin to be delivered again when the FPGA
complet
es the computation on the inputs arrived after the
reconfiguration. The longer is the repair time by
reconfiguration and the longer is the computation latency,
the more erroneous results will be delivered.

When the application is so mission critical that
no
erroneous result should be delivered, suited concurrent
error correction techniques must be adopted. If
computation continues during the identification of the new
mapping, on
-
line correction assures that no erroneous
result is delivered during such a pe
riod. However, it is not
possible to easily guarantee output correctness during
application of the new mapping to the FPGA
configuration memory. During this time in fact the
operations mapped on the CLBs and the interconnections
may not continuously corres
pond to the desired
computation since updating of the FPGA configuration
memory is not an atomic operation performed in only one
cycle of memory writing. Result delivery should be
therefore suspended until the FPGA has not been
completely reconfigured and
the first correct output after
reconfiguration has not been produced. If this small repair
time is still not acceptable for the envisioned application,
more complex mappings should be considered in order to
assure that at least one instance of the applicat
ion circuit
continues to operate during reconfiguration without being
affected by re
-
mapping activities.

7. Graceful degradation of performance

The use of segmentation and pagination opens also a
new perspective to provide further system survival when
the
time constraints of the application can be relaxed.
When no spare physical tiles are available to support a
new configuration that excludes the faulty ones, time
-
sharing can be adopted to multiplex the operations of
several logical tiles onto the same phys
ical one.

Consider a set of logical tiles that need to share the
same physical tile in time, but that are


each other


data
independent (i.e., whose inputs do not depend from the
outputs of another tile in the group). Suited circuits load


one at a tim
e


the configuration of each logical tile onto
the shared physical one, activate the computation of the
loaded logical tile and, possibly, hold the results for the
subsequent operations in the application circuit. Graceful
degradation of performance is th
us obtained by preserving
the full functionalities of the system. If logical tiles in the
group are not data independent, they must be loaded and
executed according to the data dependencies.

This approach to virtual FPGA is similar to the
management of cen
tral memory by segmentation and
pagination. More details on virtual FPGAs can be found
in [34
-
36]: time multiplexing of the logical tile for fault
tolerance is an immediate extension of the use of these
approaches that were initially designed to overcome t
he
size limitation of the FPGA devices.

Suited standard additional circuits must be introduced
to manage the logical tiles in time sharing. Circuits and
memory storage are needed to hold and to apply the
configuration of the logical tiles and their
interc
onnections, when they are not currently mapped
onto physical ones. Registers are needed to hold the inputs
and the outputs of logical tiles when they are removed
from the physical ones. Appropriate selection signals are
needed to activate the portions of t
he computation at the
correct time.

8. Conclusions

This paper presented innovative approaches to the
design of fault
-
tolerant systems based on an application
level view of the architecture so as to exploit the
redundancy optimization opportunities offere
d by the
intrinsic characteristics of the application computation.

Segmentation of the application computation allows for
partitioning the overall computation in portions that can
be loaded into physical partitions of the FPGA device,
thus limiting the har
dware overhead due to checking,
diagnosis, and reconfiguration.

References

[1]

W.W. Peterson, E.J. Weldon:
Error Correcting Codes
, MIT
Press, Cambridge, 1972

[2]

T.R.N. Rao:
Error Coding for Arithmetic Processors
, Academic
Press, NY, 1974

[3]

S. D'Angelo, C. Metra, S.

Pastore, A. Pogutz, G.R. Sechi,
“Fault
-
Tolerant Voting Mechanism and Recovery Scheme for
TMR FPGA
-
based Systems”,
Proc. IEEE Int. Symp. Defect and
Fault Tolerance in VLSI Systems
, 1998

[4]

M. Abramovici, C. Strond, C. Hamilton, S. Wijesuriya, V.
Verma, “Using

roving STARs for on
-
line testing and diagnosis
of FPGAs in fault
-
tolerant applications”,
Proc. International
Test Conference
, 1999

[5]

K.A. Kwiat, W.H. Debany Jr., S. Hariri, “Software fault
tolerance using dynamically reconfigurable FPGAs”,
Proc. Sixth
Grea
t Lakes Symposium on VLSI
, 1996

[6]

N. Park, S.J. Ruiwale, F. Lombardi, “Testing the configurability
of dynamic FPGAs”,
Proc. IEEE Int. Symp. Defect and Fault
Tolerance in VLSI Systems
, 2000

[7]

M. Abramovici, C. Stroud, “DIST
-
based detection and diagnosis
of mul
tiple faults in FPGAs”,
Proc. International Test
Conference
, 2000

[8]

A. Doumar, H. Ito, “Testing approach within FPGA
-
based fault
tolerant systems”,
Proc. Asian Test Symposium
, 2000

[9]

W. Feng, F.J. Meyer, F. Lombardi, “Novel control pattern
generators for inter
connect testing with boundary scan”,
Proc.
Int. Symp. Defect and Fault Tolerance in VLSI Systems
, 1999

[10]
W. Feng, W.K. Huang, F.J. Meyer, F. Lombardi, “On the
Complexity of Sequential Testing in Configurable FPGAs”,
Proc. Int. Symp. Defect and Fault Toleranc
e in VLSI Systems
,
1998

[11]
W.L. Huang, F.J. Meyer, F. Lombardi, “Multiple fault detection
in logic resources of FPGAs”,
Proc. Int. Symp. Defect and Fault
Tolerance in VLSI Systems
, 1997

[12]
F. Ferrandi, F. Fummi, L. Pozzi, M. Sami, “Configuration
-
specific test pa
ttern extraction for field programmable gate
arrays”,
Proc. Int. Symp. Defect and Fault Tolerance in VLSI
Systems
, 1997

[13]
C. Stroud, E. Lee, M. Abramovici, “BIST
-
based diagnostics of
FPGA logic blocks”,
Proc. International Test Conference
, 1997

[14]
S.
-
J. Wang,
T.
-
M. Tsai, “Test and diagnosis of faulty logic
blocks in FPGAs”,
Proc. Int. Conf. Computer
-
Aided Design
,
1997

[15]
X.T. Chen, W.K. Huang, F. Lombardi, X. Sun, “A row
-
based
FPGA for single and multiple stuck
-
at fault detection”,
Proc.
Int. Workshop Defect and F
ault Tolerance in VLSI Systems
,
1995

[16]
K. El
-
Ayat, R. Cahn, C.L. Chan, T. Speers, “Array architecture
for ATG with 100% fault coverage”,
Proc. Int. Workshop Defect
& Fault Tolerance in VLSI Systems
, 1991

[17]
A. Doumar, H. Ito, “Design of switching blocks tolerat
ing
defects/faults in FPGA interconnection resources",
Proc. Int.
Symp. Defect & Fault Tolerance in VLSI Systems
, 2000

[18]
P.K. Lala, A. Walker, “An on
-
line reconfigurable FPGA
architecture”,
Proc. Int. Symp. Defect and Fault Tolerance in
VLSI Systems
, 2000

[19]
M.

Fukushi, S. Horiguchi, “Self
-
reconfigurable mesh array
system on FPGA”,
Proc. Int. Symp. Defect and Fault Tolerance
in VLSI Systems
, 2000

[20]
N.R. Mahapatra, S. Dutt, “Efficient network
-
flow based
techniques for dynamic fault reconfiguration in FPGAs”,
Proc.
Int. Symp. Fault
-
Tolerant Computing
, 1999

[21]
P.K. Lala, A. Singh, A. Walker, “A CMOS
-
based logic cell for
the implementation of self
-
checking FPGAs”,
Proc. Int. Symp.
Defect and Fault Tolerance in VLSI Systems
, 1999

[22]
G.A. Mojoli, D. Salvi, M. Sami, G.R. Sechi,

R. Stefanelli,
“KITE: a behavioural approach to fault
-
tolerance in FPGA
-
based systems”,
Proc. Int. Symp. Defect and Fault Tolerance in
VLSI Systems
, 1996

[23]
G.H. Chapman, B. Dufort, “Making defect avoidance nearly
invisible to the user in wafer scale field p
rogrammable gate
arrays”,
Proc. Int. Symp. Defect and Fault Tolerance in VLSI
Systems
, 1996

[24]
J. Lach, W.H. Mangione
-
Smith, M. Potkonjak, “Enhanced
FPGA reliability through efficient run
-
time fault
reconfiguration”,
IEEE Transactions on Reliability
, 2000

[25]
W.

Shi, K. Kumar, F. Lombardi, “On the complexity of switch
programming in fault
-
tolerant configurable chips”,
Proc. Int.
Symp. Defect and Fault Tolerance in VLSI Systems
, 2000

[26]
L. Antoni, R. Leveugle, B. Feher, “Using run
-
time
reconfiguration for fault injec
tion in hardware prototypes”,
Int.
Symp. Defect and Fault Tolerance in VLSI Systems
, 2000

[27]
J. Emmert, C. Stroud, B. Skaggs, M. Abramovici, “Dynamic
fault tolerance in FPGAs via partial reconfiguration”,
Proc.
IEEE Symp. Field
-
Programmable Custom Computing
M
achines
, 2000

[28]
S. Dutt, V. Shanmugavel, S. Trimberger, “Efficient incremental
rerouting for fault reconfiguration in field programmable gate
arrays”,
Proc. Int. Conf. on Computer
-
Aided Design
, 1999

[29]
A. Doumar, S. Kaneko, H. Ito, “Defect and fault tolerance
FPGAs by shifting the configuration data”,
Proc. Int. Symp.
Defect and Fault Tolerance in VLSI Systems
, 1999

[30]
J. Lach, W.H. Mangione
-
Smith, M. Potkonjak, “Low overhead
fault
-
tolerant FPGA systems”,
IEEE Trans. on Very Large Scale
Integration Systems
, 1998

[31]
F. Hanchek, S. Dutt, “Methodologies for tolerating cell and
interconnect faults in FPGAs”,
IEEE Transactions on
Computers
, 1998

[32]
A. Mathur, C.L. Liu, “Timing driven placement reconfiguration
for fault tolerance and yield enhancement in FPGAs”,
Proc.
Europe
an Design & Test Conference
, 1996

[33]
G. De Micheli:
Synthesis and Optimization of Digital Circuits
,
McGraw
-
Hill, New York, NY, 1994

[34]
W. Fornaciari, V. Piuri, “Virtual FPGAs: some steps behoind
the physical barriers”,
Proc. IEEE Reconfigurable Architectures
Wor
kshop
, 1998

[35]
W. Fornaciari, V. Piuri, “General methodologies to virtualize
FPGAs in Hw/Sw systems”,
Proc. IEEE Midwest Symposium on
Circuits and Systems
, IN, USA, 1998

[36]
W. Fornaciari, V. Piuri, "Virtualization of FPGA via
Segmentation",
Proc.
ACM Int. Sympos
ium on FPGA
, 2000

[37]
R.A. Bergamaschi, S. Raje, I. Nair, L. Trevillyan, "Control
-
Flow Versus Data
-
Flow
-
Based Scheduling: Combining both
Approaches in an Adaptive Scheduling System",
IEEE Trans.
on Very Large Scale Integration Systems
, 1997

[38]
G. Lakshminarayana
, A. Raghunathan, N.K. Jha: "Behavioral
Synthesis of Fault Secure Controller/Datapaths Based on
Aliasing Probability Analysis",
IEEE Trans. on Computers
,
2000

[39]
A. Antola, V. Piuri, M. Sami, “Optimising High
-
Level
Synthesis for Self
-
Checking Arithmetic Circu
its”,
Proc. Int.
Symp. Defect & Fault Tolerance in VLSI Systems
, 1996

[40]
A. Antola, V. Piuri, M. Sami, “A High
-
Level Synthesis
Approach to Optimum Design of Self
-
Checking Circuits”,
Proc.
European Design Automation Conference
, 1996

[41]
A. Antola, V. Piuri, M. Sam
i, “High
-
level synthesis of data
paths with concurrent error detection”,
Proc. Int. Symp. Defect
and Fault Tolerance in VLSI Systems
, 1998

[42]
A. Antola, F. Ferrandi, V. Piuri, M. Sami, “Semi
-
Concurrent
Error Detection in Data Paths”,
IEEE Trans. on Computers
,

2001

[43]
R. Negrini, M. Sami, R.Stefanelli,
Fault tolerance through
reconfiguration in VLSI and WSI arrays
, MIT Press,
Cambridge, MA, 1988