Secure Partial Reconfiguration of FPGAs

sanatoriumdrumElectronics - Devices

Nov 25, 2013 (3 years and 8 months ago)

123 views

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE
-
CLICK HERE TO EDIT) <


1



Abstract

This paper investigates a method to improve the
security of SRAM FPGAs through exploiting the embedded
processor cores and dynamic partial reconfiguration. Currently,
Xilinx devices ca
nnot be partially reconfigured with an encrypted
bitstream. To perform a secure partial reconfiguration, the
implemented platforms in this paper utilize an application
running on the hard/soft embedded processor core to authenticate
and decrypt an encrypte
d partial bitstream. This scheme enables
embedded systems to increase their design security without
requiring external circuitry.

The presented platforms have been designed with a minimal
footprint for both embedded PowerPC hard core and Xilinx
MicroBlaze
soft core processor targeting Xilinx Virtex
-
II Pro
devices. Comparison of the timing results and resource utilization
of each design are also provided.


Keywords


Dynamic Reconfiguration, Embedded Processor,
Design Security, FPGA


I.

I
NTRODUCTION

S

the pe
rformance gap between the FPGAs and ASICs
decreases [1], platform FPGAs with various configurable
elements and embedded blocks provide new solutions for high
density and high
-
performance embedded system designs.
These platforms not only enable the system a
rchitects to
design and develop complex custom systems using processor
and interoperable IP cores but also provide technologies such
as dynamic reconfiguration of part of an FPGA while other
areas of the device remain operational. There are many
advantages

in partial dynamic reconfiguration especially for
applications that require adaptive and flexible hardware such
as mobile communication applications. Deploying run
-
time
reconfiguration in systems results in reduced chip area and
power consumption.

Consid
ering the wide range of features, platform FPGAs
address many new application areas and therefore increase of
their popularity makes the need for design security
mechanisms even more important. The design security must
protect the design against cloning an
d reverse engineering that
correspond to different attacks. A survey in [2] analyzes
possible attacks against FPGAs. In the case of SRAM FPGAs
this directly concerns with protecting the bitstream especially

Manuscript received May 13, 2005. This research
was done as a thesis
submitted in partial fulfillment of the requirements for the degree of Master of
Science at George Mason University.

A. Sheikh Zeineddini is with the Electrical Engineering Department,
George Mason University, Fairfax, VA 22020 USA, (e
-
mail:
asheikhz@gmu.edu).

during configuration and reconfiguration. Bitstre
am encryption
as a solution increases the level of security and makes the
configuration bitstream secure against attackers.


Xilinx [3] security system uses CAD tools for bitstream
encryption and an internal circuit for decryption. The major
disadvantage
of this scheme is that the partial reconfiguration
capability of FPGA is disabled and therefore a device
configured with an encrypted bitstream cannot be partially
reconfigured. By using new features of platform FPGAs for
creating a self
-
reconfigurable pla
tform [4], bitstream security
can be achieved specifically for designs that benefit from
partial reconfiguration. Self
-
reconfiguration extends the
dynamic reconfiguration capability in which particular circuits
on the logic array such as embedded processor

cores can be
used to control the configuration of other areas of the FPGA.

In this paper a self
-
reconfigurable platform for Xilinx
Virtex
-
II Pro devices is realized that is capable of performing
secure partial reconfiguration of the FPGA after the initial

configuration and provides the flexibility of using arbitrary
algorithms for encryption/decryption of partial bitstreams.

The rest of the paper is organized as follows. Section II
presents the related work and background. In section III an
overview of ED
K tools and evaluation board is presented.
Section III explains the hardware architecture of the
implemented self
-
reconfigurable systems for both hard/soft
processor cores. Section IV presents the methodology of the
experiment Section V presents the obtain
ed results. In section
VI discussion of the results and conclusion is provided.

II.

R
ELATED
W
ORK AND
B
ACKGROUND

A.

Related Work

Xilinx security scheme is simple and efficient. All Virtex II
family devices (Virtex
-
II, Virtex
-
II Pro, and Virtex
-
II Pro X
FPGAs) use

Triple DES encryption scheme [5] even though in
Virtex
-
4 Xilinx replaced Triple DES with AES to increase the
security and throughput. The scheme exploits software support
of Xilinx ISE CAD tools for both encryption of the bitstream
and key generation. For

decryption, it uses an on
-
chip
decryptor along with the internal decryption keys stored in a
dedicated memory. Either externally connected battery or an
auxiliary power supply (V
CCAUX
) is the source of power for
volatile storage of the keys. The keys are

erased if there is a
tampering with the device.

Figure 1 shows the Xilinx security system. The problem
with this scheme is the extra area and cost needed for the
external battery and the disablement of partial reconfiguration
for encrypted bitstreams.

Secure Partial Reconfiguration of FPGAs

A. Sheikh Zeineddini

A

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE
-
CLICK HERE TO EDIT) <


2

Ano
ther method proposed in [6] removes the need for an
external battery by finding a way to store a secret key on the
FPGA such as use of laser to engrave the key. This will make
it necessary for the FPGA to contain both encryption and
decryption circuits and

hence there is no need for the software
support of encryption. This solution uses more FPGA silicon
area and also lacks flexibility since the encryption and
decryption circuits are fixed with no possibility of upgrade or
use of another algorithm.

In [7]
a new solution is only proposed with no actual
possibility of the implementation at the moment in which a
dedicated configuration controller manages the encryption and
decryption configuration schemes. By relying on partial
reconfiguration some form a self
-
reconfigurable platform is
needed in which the selected core for encryption and
decryption are placed in FPGA by the configuration controller
and then removed to free the chip area. It also needs an
embedded key instead of an externally powered storage fo
r
key.

The method is flexible and adjusts the security level to
application needs but is relatively complex considering the
limitations for partial reconfiguration imposed by the FPGA
manufacturers and the CAD tools.

The work presented in this paper focu
ses on a more specific
case in which only secure partial reconfiguration after initial
reconfiguration is considered. By utilizing embedded
microprocessors as a configuration controller inside the chip,
partial reconfiguration can be achieved using
hardwar
e/software encryption/decryption cores. The next
section provides a background about the flows for partial
reconfiguration along with the overview of the Virtex
-
II Pro
platform FPGA and the tools for creating a self
-
reconfigurable
design.

B.

Background

Virte
x
-
II Pro device used as the platform FPGA in our
designs combines a variety of features in the FPGA fabric with
hardware/software IP cores [8]. It incorporates embedded
PPC405 cores and RocketIO multi
-
gigabit transceivers. The
embedded PPC405 core is a 32
-
bit Harvard architecture
processor with functional units such as cache unit and MMU.
It operates in a five
-
stage pipeline and most instructions
execute in a single cycle. Processor block containing
embedded IBM PowerPC 405 RISC CPU (PPC405) core, on
-
chip m
emory controller, and integration circuitry is compatible
with CoreConnect bus architecture [9] and enables the
compliant IP cores to integrate with this block. The
CoreConnect architecture provides three buses for
interconnection of hard/soft IP cores. Th
e key features of
CoreConnect are the Processor Local Bus (PLB), On
-
chip
Peripheral Bus (OPB) and the Device Control Register (DCR)
Bus.

Virtex
-
II Pro is configured by selecting the configuration
mode and interface to load the data into the configuration
m
emory segmented into frames [10]. Data is loaded on a
column
-
basis and each column contains frames that have
different sizes based on the device.

Virtex
-
II Pro configuration architecture features an Internal
Configuration Access Port (ICAP) that provides
a
configuration interface to FPGA fabric. It does not support
different configuration modes and cannot be used for full
configuration. With no hand shaking mechanism ICAP
interface can be clocked up to maximum frequency of 66MHz.

In active partial reconfig
uration new data is loaded to
reconfigure a particular area of FPGA while the rest of it is
still operational. Xilinx introduces two flows for active partial
reconfiguration [5]:

1) Module
-
Based:

This flow is based on the Xilinx Modular
Design methodology
[6] and is further decomposed depending
on whether communication is needed between the modules. A
special bus macro is required for inter
-
module
communications.

Bus macros use fixed routing resources that have no
variation in routing. Currently 3
-
state bu
ffers and horizontal
lines resources in FPGA are used for implementing a bus
macros but Xilinx will introduce a new implementation with
the release of ISE 8.1 because Virtex
-
4 devices does not have
TBUFs. Figure 2 shows a bus macro used for inter
-
module
co
mmunication and its implementation with 3
-
state buffers.

Reconfigurable modules in the design need to have specific
properties in terms of their width, height, and placement.
Available resources are also limited only to those
XILINX
SOFTWARE
CONFIGURATION
GENERATOR
ENCRYPTION
SOFTWARE
CONFIGURATION
GENERATOR
ENCRYPTION
SOFTWARE
CONFIGURATION
STORAGE
ENCRYPTED
CONFIGURATION
XILINX FPGA
CONFIGURATION
MEMORY
DECRYPTION
CIRCUIT
KEYS STORAGE
KEYS
KEYS
EXTERNAL BATTERY


Fig. 1. Xilinx Security Solution



















Table 4. Decryption Phase Timing Results (Clock Cycles)



Fig. 2. Physical implementation of a 4
-
bit bus macro


















> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE
-
CLICK HERE TO EDIT) <


3

encompassed by the width of th
e module and communications
with other modules, both fixed and reconfigurable, should take
place through bus macros.

HDL coding and synthesis process follow some general
guideline in terms of the structure of top
-
level design,
instantiation of bus macros,

shared signals, and synthesis
attributes.

The implementation flow takes place in three phases after
the design entry. In initial budgeting phase design is floor
planned and constrained based on the properties of each
module. Result is a .ucf file that is
used for active
implementation phase. This phase places and routes each
module separately in the context of the top
-
level logic and
constraints. The final assembly phase uses all placed and
routed modules generated from the previous phase to combine
them i
nto a complete FPGA design. To maintain the
performance of each module, the placement and routing for
each module are preserved.

At present, bitstreams generated for the full design require
that the initial bitstream include at least one variation of any
partially reconfigurable module. This means that the initial
bitstream should be a complete design since all global
resources such as clocking logic needs to be placed and locked
down. Bitstream frames for clocks are separate from other
frames. This impose
s a limit in which a completely separate
module cannot be added to an initial design with partial
reconfiguration flow.


2) Difference
-
Based:

Using this flow the design can
change either at the front
-
end or the back
-
end. For changes in
HDL code or schemat
ics at the front
-
end, the design must be
re
-
synthesized and re
-
implemented while for back
-
end changes
the FPGA Editor tool can modify sections of the design. Many
different types of changes can be made using this tool
including routing information, LUT pro
gramming, changing
BRAM contents and I/O standards.

Bitstream generator BitGen with the proper options setting
can create a partial bitstream that contains only the difference
between the modified design and the initial bitstream. The
produced partial bits
tream is small and quick to load. A partial
bitstream can be loaded only after the device power up and
loading an initial bitstream. The design must take into account
the transition time of the reconfigurable module(s) and other
modules should not rely on
the state of the signals connected
to the reconfigurable module.

III.

EDK

T
OOLS AND
X
ILINX
ML310

B
OARD

(Information about EDK tools and board are explained here.)

IV.

I
MPLEMENTED
S
ELF
-
R
ECONFIGURABLE SYTEMS

Figure 3 and 4 show the hardware components of the
construc
ted self
-
reconfigurable platforms targeting both
embedded PowerPC and MicroBlaze soft processor cores.
Both systems run at 100MHz. The use of OPB bus in both
systems is required since current implementation of the control
logic for ICAP is in form of soft
ware (hardware drivers) and
its interface connects the ICAP to OPB bus. It also instantiate
a BRAM on OPB. The PowerPC system requires the use of
PLB bus thereby PLB to OPB Bridge is also needed to
connect to OPB bus. Whereby MicroBlaze system needs two
in
stant of Local Memory Bus (LMB) connected to a dual
-
port
BRAM, one for instruction and one for data. The DDR
memory available on the board was selected as the external
memory for storage of the partial bitstream and therefore OPB
DDR Controller was used fo
r both systems. UART and
additional features are not the essential parts of the self
-
reconfigurable systems but provide ease of use for user
application. The systems were designed considering the
minimal foot print of the components.

V.

E
XPERIMENT
M
ETHODOLOGY

The considered scenario for the experiment is as follows.
The self
-
reconfigurable system reads an authenticated and
encrypted partial bitstream stored in an external memory. It
then authenticates the partial bitstream. If the authentication is
successful
it decrypts the partial bitstream and configures the
device using ICAP. Software cores are used for authentication
and decryption. The keys are supposed to be stored in the
program even though other cases are also possible.

Virtex
Virtex
-
-
II Pro
II Pro
User
User
Interface
Interface
PowerPC
PowerPC
405
405
PLB BUS
PLB2OPB
PLB2OPB
BRIDGE
BRIDGE
OPB BUS
BRAM
BRAM
ICAP
ICAP
OPB DDR
OPB DDR
CNTLR
CNTLR
UART
UART
JTAG
JTAG
CNTLR
CNTLR
Fig. 3. PowerPC System

Virtex
Virtex
-
-
II Pro
II Pro
User
User
Interface
Interface
ILMB
ILMB
MicroBlaze
MicroBlaze
DLMB
DLMB
PLB BUS
OPB BUS
BRAM
BRAM
ICAP
ICAP
OPB DDR
OPB DDR
CNTLR
CNTLR
UART
UART
Debug
Debug
Module
Module
Dual Port
Dual Port
BRAM
BRAM
OPB Wd
OPB Wd
Timer
Timer
Fig. 4. MicroBlaze System



















Fig. 3. PowerPC System

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE
-
CLICK HERE TO EDIT) <


4

To perform the experiment using

the implemented self
-
reconfigurable systems the first step was generating a partial
bitstream. Different
-
Based method was selected for this
purpose since the modular flow requirements were problematic
while using the ML310 board. An additional MicroBlaze
system was implemented in the form of a microcontroller. The
design only included Block RAM memory and GPIO
connected to LEDs of the board. The application running on
this system was generating a pattern on LEDs. Using
Difference
-
Based flow a partial bitst
ream was created that
would change the contents of the BRAM where the program
running on the system had been stored. That would result a
different pattern to appear on LEDs. Using EDK this system
was combined separately with each of the self
-
reconfigurable

systems.

The next step was developing the application running on the
self
-
reconfigurable system that performs the required tasks.
The ICAP API [4] was used for transferring the data between
the external configuration memory and OPB BRAM
configuration cach
e. The ICAP API defines methods for
accessing the configuration logic through ICAP port. The
HMAC
-
SHA1 used as authentication algorithm and AES for
encryption/decryption. Both cores were freely available
implementations of the algorithms created by Dr. Gla
dman [ ].
These cores were ported to EDK environment so that they can
be used as libraries available for application.

After encrypting and signing the partial bitstream, Xilinx
Microprocessor Debugger was used to load the bitstream from
the host machine (
connected to the board) to the DDR memory
on the board. Then the application running on the self
-
reconfigurable platform successfully authenticated the signed
partial bitstream with the stored MAC value, decrypted the
encrypted partial bitstream using the
stored key, and
dynamically partially reconfigured the other active system on
FPGA. The experiment was successful due to the fact that the
new pattern was displayed on the LEDs of the board.

The next section presents the timing result obtained for
executio
n of each phase of the application along with the
device resource utilization summary.

VI.

R
ESULTS

A.

Timing Measurements

For each phase of the process (authentication, decryption,
and configuration) 10 measurements was done by obtaining
the number of clock cycl
es required for each processor to
execute the functions. For PowerPC system no extra
component was needed since a time
-
base register inside the
processor is available that work with the system clock. For
MicroBlaze system a watch
-
dog timer on OPB bus was u
sed
that contains a time
-
base register. OPB bus works with the
system bus. In table 1, 2, and 3 the result of each phase is
provided. Table 4 shows the comparison of the results.

B.

Resource Utilization Summary

Systems were designed with only the required com
ponents.
In table 5 a summary of the device utilization is provided and
in table 6 shows the contribution of different IP cores to that
result.”

VII.

C
ONCLUSION

A conclusion section is not required. Although a conclusion
may review the main points of the paper,

do not replicate the
abstract as the conclusion. A conclusion might elaborate on the


Table 1. Authentication Phase Timing Results (Clock Cycles)



Table 2. Decry
ption Phase Timing Results (Clock Cycles)

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE
-
CLICK HERE TO EDIT) <


5

importance of the work or suggest applications and extensions.





VIII.

C
ONCLUSION

To be added…

A
PPENDIX

1. The detailed procedure for creating the partial bitstream
using
Difference
-
Based method will be here.

2. The detailed procedure for creating the self
-
reconfigurable systems using EDK will be here.

R
EFERENCES

[1]

S. Wong, S. Vassiliadis, and S. Cotofana, “Future directions of
(programmable and reconfigurable) embedded proce
ssors,” in
Embedded Processor Design Challenges, Workshop on Systems,
Architectures, Modeling, and Simulation

SAMOS 2002.

[2]

T Wollinger, J. Guajardo, C. Paar, “Security on FPGAs: State
-
of
-
the
-
Art
Implementations and Attacks,” in ACM 2004.

[3]

Xilinx, Inc. web si
te. http://www.xilinx.com/.

[4]

B. Blodget, P. James
-
Roxby, E. Keller, S. McMillian, and P.
Sundararajan. A Self
-
reconfiguring Platform. In Proceedings of the
International Conference on Field Programmable Logic, Lisbon,
Portugal, Sept. 2003.

[5]

R. Krueger, “Usin
g High Security Features in Virtex
-
II Series FPGAs”.
Xilinx Application Note XAPP766, version 1.0, Xilinx, Inc. July 2004.

[6]

L. Bossuet, G. Gogniat, W. Burleson, “Dynamically Configurable
Security for SRAM FPGA Bitstreams,”

[7]

“Development System Reference Gu
ide”, Xilinx, Inc., 2005

[8]

“Virtex
-
II Platform FPGA Handbook”, version 2.0, Xilinx, Inc., 2004

[9]

IBM web site. http://www.chips.ibm.com/products/coreconnect. 2003

[10]

“Virtex
-
II Platform FPGA User Guide”, version 4.0, Xilinx, Inc., 2005

[11]

“Two flows for partial reco
nfiguration: Module based or Difference
Based”. Xilinx Application Note XAPP290, version 1.2, Xilinx, Inc.,
2004

[12]

“Embedded Systems Design”, 2nd ed. S. Heath, Newnes, 2002

[13]

“Embedded System Tools Reference Manual”, version 3.0, Xilinx, Inc.
(2004)

[14]

“Platform
Studio User Guide”, version 3.0, Xilinx, Inc. (2004)

[15]

“PowerPC Processor Reference Guide”, version EDK6.1, Xilinx, Inc.
(2003)

[16]

“EDK OS and Libraries Reference Manual”, version 3.0, Xilinx, Inc.
(2004)





Table 3. Configuration Phase Timing Results (Clock Cycles)



Table 4. Comparison of the Timing Results




Table 5. Device Utilization Summary




Table 6. Resource Usage Summary of IP Cores

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE
-
CLICK HERE TO EDIT) <


6