Hardware Algorithm Verification Platform

crashclappergapΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

147 εμφανίσεις

Hardware Algorithm Verification Platform


Chuan Du

July
, 201
3


Abstraction

This
article

discusses
Hardware Algorithm Verification Platform
and lists
one

demonstration to show its

performance
.

This platform includes hardware
algorithm verification flow, har
dware algorithm design flow and software
algorithm generation for analysis of hardware performance.


INTRODUCTION

FPGA technology is used for ultri
-
low latency trading particularly for HFT field. The algorithm
plays important role in the trend. How to tran
splant software financial algorithm to hardware
algorithm is important step for many engineers who wants to make achievements in this field. This
Hardware Algorithm Verification Platform can help them achieve their goal to verify their
hardware algorithm o
n the FPGA platform and compare the result with software resu
lt to evaluate
the performance.

This platform includes C/C++ algorithm development, FPGA system
development and FPGA system verification which include all the design flows for the software
and ha
rdware algorithm design.


Algorithm developers do not need to understand the detail of this platform. They just follow the
algorithm development flow and verify the system.

Platform
S
ystem Description

Software Platform
Software Platform
Nios II
ETHERNET
Interface
FPGA
Hardware
Algorithm
Data
File
Read
Software
Algorithm
Software
Result
File Write
TCP Socket
Send
TCP Socket
Receive
Hardware
Result
File Write
Data process
RAM
TCP
/
IP
connection
UART
Timer
DDR
Interface

System feature list



Socket TCP/IP protocol



uCOS
II
Multi
-
thread design



Nios II
INICHE TCP/IP protocol stack



32 bit RSIC Nios
II MicroProcess



Perl Command line operation



TCL M
odelsim command line verification



Ethernet connection


can be change to other connection such as PCIE with no change

in
other modules



Hardware float algorithm design



Software float algorithm design


FPGA
Hardware

structure description

Here is the mainly system structure of FPGA
hardware algorithm design

platform:


In this structure, NiosII is
Altera®

MicroProcess which can run software

in the FPGA.

All devices
are connected
by
the system bus and
controlled

by NiosII MicroProcess. Algorithm trader can use
connect
to
platform by serial port connector. (Additional Ethernet interfaces can be added to
connect to trader

s computer). RJ
-
45 is
usual
Ethernet

interface which is used to connect to
communicate
with
b
roker,
d
ata feed or
financial server
s

through Internet
.


Algorithm system is the part which we can design trading algorithm instructions. Algorithm
traders can design their trading alg
orithm on the Matlab, generate FPGA programs and integrate
them into FPGA system very easily. Of course, if traders are professional about FPGA
programming, they can use FPGA develop flow to design the algorithm and optimize the system
in according to real

environment.

Nios II
ETHERNET
Interface
UART
Timer
Flash
Interface
DDR
Interface
System
Register
Array
data
_
proc
_
ram
FPGA
Hardware Algorithm
DDR
2
DDR
2
FLASH
FLASH
ETHERNE
T
PHY
RJ
-
45
Serial port
connecter
FPGA
TRADING PLATFORM
Hardware system
Algorithm system

Nios II s
oftware structure description

Nios II s
oftware development is also needed in this platform in order to control the system and

connect to Internet.


Driver level programs provide functions to higher level software through reading a
nd writing
hardware registers. uCOS
-
II is opencore operation system which is widely used in the embedded
system. Here, it is used to manager multi
-
threads and tasks. INICHE is light weight TCP/IP stack.
This platform uses this TCP/IP stack to connect to in
ternet.


In the application level, data format converting modules are used to make sure that Hardware
can

receive right IEEE 745 float data format. String to Float module is used to change the ASCII string
to Float string. IEEE745 to HEX module converts t
he IEEE745 string format to HEX format
which can be used by hardware algorithm.

Nios II
Timer
Eth
Driver
FLASH
Driver
DDR
Driver
UART
Driver
REG
Driver
Driver
C Standard Library
uCOS II
(
Operation System
)
Iniche
(
TCP
/
IP Stack
)
Socket
Altera HAL
(
Hardware Abstract Level
)
Protocol
Application
String to Float
IEEE
745
Char to Hex
Data Process RAM
Write Interface
Data Process RAM
Read Interface
Data Process RAM
Status Moniter


C/C++ S
oftware
development
structure description

The software development environment mainly send
s

the data to hardware platform, generate
s

software algorithm result to anal
ysis the performance of hardware algorithm.


Fileread
(
Get the process data
from file
)
Cldatatransfer
(
Process data and transfer
it to hardware through
Ethernet socket TCP
connection
)
hardwarefilewrite
(
Write back the processed
data to file for future
analysis
)
kernel
_
softare
(
Data format and software
algorithm process for
compare between
hardware and software
algorithm
)
softwarefilewrite
(
Write back the processed
data to file for future
analysis
)
(
software algorithm
)
Socket
Successful
?
WSAStartup
Successful
?
Yes
Yes
WSACleanup
( )
connect
Successful
?
Yes
send
Successful
?
Yes
closesocket
No
iiRecd
?
=
0
||
iiRecd
?
=
SOCKET
_
ERROR
FD
_
ISSET
(
socketsend
,
&
readfds
)
recv
String to float


Engineers are
able to use the SDK such as VS2010 to develop C/C++ algorithm

without any
understanding about this architecture. The design flow will be discussed in the demonstration.




Whole system operation

flow chart

C host Program
(
SLAVE
)
Nios II uCOS II
NicheStack TCP
/
IP Stack
(
SERVER
)
Fileread
(
Get the process data
from file
)
Socket
Successful
?
WSAStartup
Successful
?
Yes
Yes
connect
Successful
?
Yes
send
Successful
?
Yes
closesocket
No
iiRecd

?
=
0
||
iiRecd
?
=
SOCKET
_
ERROR
FD
_
ISSET
(
socketsend
,
&
readfds
)
recv
hardwarefilewrite
(
Write back the processed
data to file for future
analysis
)
OSTimeSet
netmain
alt
_
iniche
_
init
TK
_
NEWTASK
(
&
wsta
sk
)
WSCreateTasks
(
fd
_
listen
=
socket
(
AF
_
INET
,
SOCK
_
STREAM
,
0
))
(
bind
(
fd
_
listen
,
(
struct
sockaddr
*)
&
addr
,
sizeof
(
addr
)))
(
listen
(
fd
_
listen
,
1
))
FD
_
SET
(
fd
, &
readfds
)
Select
(
)
FD
_
SET
(
fd
, &
readfds
)

?
http
_
handle
_
receive
String to Float
IEEE
745
Char to Hex
Data Process RAM
Write Interface
Data Process RAM
Read

Interface
Data
_
proc
_
ram
RSV
Hardware
Algorithm
Data
_
proc
_
ram
Send
Data Process RAM
Status Moniter
send
Hardware Architecture


The hardware and software algorithm developer do not need to understand

this process. This
information is for platform developer.

Algorithm Development Flow

This platform integrates all the necessary technology for hardware and software alg
orithm
developers to design and verify design. They just need edit the
original_data.txt

file for the input
float data and executes several commands to generate the
hardware_out.txt

and
software_out.txt

files which contain the algorithm result from hardwar
e and software algorithms. Then the
developers can use other tools such as R and Matlab to analysis these data.


File System Structure


Test Result I
n
put fold
: Test data input and output fold. Algorithm developers just need to focus the
files in the fold
s and follow the execution commands.

FPGA Hardware Algorithm system
: FPGA Hardware algorithm VHDL/Verilog Codes fold.

Nios II software system

\
: Nios II design project files fold. Nios EDK project target fold.

FPGA Verification System
: Modelsim FPGA hardwa
re verification platform. Modelsim target fold.

C/C++ Software development system
: VS2010 project target fold.

Perl Command Line
: Command line resources fold.




Hardware Design
verification
flows

1. Design the VHDL/Verilog codes in according to the targe
t algorithm. The codes locates in the
fold: %
DUCAOCLSDKROOT
%
\
board
\
EthDE2
\
hardware
\
EthDE2
\
ip
\
. For example the black
scholes design resources locate in the fold %
DUCAOCLSDKROOT
%
\
board
\
EthDE2
\

hardware
\

EthDE2
\
ip
\
black_scholes

2. Make modification
s

of the c
ompile_rtl.do file in the %
DUCAOCLSDKROOT
%
\
designs

\
fpga_src_kernel
\
sim
\
env

to adapt the VHDL/Verilog
re
sources

location
.


###############################################################################

# Design Files

######################################
#########################################

vcom
-
2002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/black_scholes/ip/ALTFP_ADD_SUB/ALTFP_ADD_SUB.vhd

vcom
-
2002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/black_scholes/ip/ALTFP_CONVERT_Ft
oI/ALTFP_CONVERT_FtoI.vhd

vcom
-
2002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/black_scholes/ip/ALTFP_CONVERT_ItoF/ALTFP_CONVERT_ItoF.vhd

vcom
-
2002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/black_scholes/ip/ALTFP_DIV/ALTFP_DIV.vh
d

vcom
-
2002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/black_scholes/ip/ALTFP_EXP/ALTFP_EXP.vhd

vcom
-
2002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/black_scholes/ip/ALTFP_INV/ALTFP_INV.vhd

vcom
-
2002
-
work work ../../../../board/
EthDE2/hardware/EthDE2/ip/black_scholes/ip/ALTFP_INV_SQRT/ALTFP_INV_SQRT.vhd

vcom
-
2002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/black_scholes/ip/ALTFP_LOG/ALTFP_LOG.vhd

vcom
-
2002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/black_
scholes/ip/ALTFP_MULT/ALTFP_MULT.vhd

vcom
-
2002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/black_scholes/ip/ALTFP_SQRT/ALTFP_SQRT.vhd

vcom
-
2002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/black_scholes/black_scholes_did2.vhd

vcom
-
2
002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/black_scholes/black_scholes_cndf.vhd

vcom
-
2002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/black_scholes/black_scholes_optionprice.vhd

vcom
-
2002
-
work work ../../../../board/EthDE2/har
dware/EthDE2/ip/black_scholes/black_scholes_process.vhd

vcom
-
2002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/black_scholes/black_scholes_top.vhd

vcom
-
2002
-
work work ../../../../board/EthDE2/hardware/EthDE2/ip/kernel_top/kernel_top.vhd


3.
Ch
ange the entity name kernel_to
p

to the current design entity top name such as
black_scholes_top in the tb_top.vhd file.

The tb_top.vhd locates in the fold


%
DUCAOCLSDKROOT
%
\
designs
\
fpga_src_kernel
\
sim
\
tb

--

u_kernel : kernel_top

u_kernel : black_scholes_to
p



port map(




rst_125m


=> s_system_reset


,




clk_125m


=> s_clk_125m


,




valid



=> s_valid_test


,




rdata



=> s_rdata_test


,




wen



=> s_wen



,




wdata


=> s_wdata






);


4. Prepare the tick_data for verification system.


4
-
1 Open the
%
DUCAOCLSDKROOT
%
\
designs
\
fpga_src_kernel
\
sim
\
tick_data
\
data_in_float.txt

file and enter the design test data.
For example: Black scholes design test data is


4
-
2 Use the tools
float_to_ieee745_for_x86.exe

to change the float data to IEEE745 HEX format
in

the data_in.txt.


5.

Modelsim compile and verify FPGA Algorithm.

5
-
1

Open the Modelsim

and relocate the project to
%
DUCAOCLSDKROOT
%
\
designs

\
fpga_src_kernel
\
sim
\
env



5
-
2
D
o compile_all.do to compile all the design resources in to library.
The necessary

FPGA lib
files locates in the ..
\
lib
\
altera
.

(
220model.vhd
/
220pack.vhd
/
altera_mf.vhd
/
altera_mf_components.vhd
/
lpm_pack.vhd
).



5
-
3
do tb_01.do to load vism tb_top verification



5
-
4 Load the
TCL

c
ommands to access the registers

to start the algorithm ex
ec
ution.

Registers
addr
ess

please refer to the
page_ctrl.vhd

in the ..
\
emulators
\


5
-
5 Open the
..
\
tick_data
\
data_out.txt to check the result.


There is a tool which can used to convert the IEEE745 HEX data to Float format.


You can notice that the res
ult is 0.2493324 for
3E7F5100
. Later software algorithm result is the
same as this result.

This verification platform is aim to debug the VHDL/Verilog resources in
timing checking.


5
-
6 Check the timing to verify the algorithm design



Hardware Design
Syn
thesis
f
lows

The synthesis process is executed by command lines.

Perl commands:



A
oclkc

version
:
Print o
ut version infomation



A
oclkc

help
:
Show this message



A
oclkc

clean
:
Clean the redundant files



A
oclkc

clean_
project
:
Clean the

files in the Quart
us II

project such as the *.rpt

and *.sof



A
oclkc

filelist <algorithm top file entity name>
:

S
tart the Quartus II FPGA project including Synthesis/Fit/Timing/Assemble



A
oclkc

download <project bit file name>
:

Download the .sof file to FPGA through USB
-
Blaster


1. Prepare the
filelist.flist

in the
%
DUCAOCLSDKROOT
%
\
designs
\
\
fpga_src_kernel
.

black_scholes/ip/ALTFP_ADD_SUB/ALTFP_ADD_SUB.vhd

black_scholes/ip/ALTFP_CONVERT_FtoI/ALTFP_CONVERT_FtoI.vhd

black_scholes/ip/ALTFP_CONVERT_ItoF/ALTFP_CONVERT_ItoF.vhd

black_s
choles/ip/ALTFP_DIV/ALTFP_DIV.vhd

black_scholes/ip/ALTFP_EXP/ALTFP_EXP.vhd

black_scholes/ip/ALTFP_INV/ALTFP_INV.vhd

black_scholes/ip/ALTFP_INV_SQRT/ALTFP_INV_SQRT.vhd

black_scholes/ip/ALTFP_LOG/ALTFP_LOG.vhd

black_scholes/ip/ALTFP_MULT/ALTFP_MULT.vhd

black_scholes/ip/ALTFP_SQRT/ALTFP_SQRT.vhd

black_scholes/black_scholes_cndf.vhd

black_scholes/black_scholes_did2.vhd

black_scholes/black_scholes_optionprice.vhd

black_scholes/black_scholes_process.vhd

black_scholes/black_scholes_top.vhd

The direction ough
t to be align with the fold in the
%
DUCAOCLSDKROOT
%
\
board
\
EthDE2

\
hardware
\
EthDE2
\
ip
\
black_scholes
.


2
. Enter the design fold

%
DUCAOCLSDKROOT
%
\
board
\
EthDE2
\
hardware
\
EthDE2

through the
Command line mode.



3. Use the Aoclkc


clean and aoclkc

clean_projec
t to clean this project.


Note: Could not find these files because I have already deleted these files.


4. Use the Aoclkc


filelist black_scholes_top to start the Quartus II project flow.


Running this project to generate the .sof file. It needs about 1

hour to wait for this process.


5.
Check the USB
-
Blaster connection and Hardware environment is ready


6.
Use the Aoclkc


download to start the Quartus II project flow.



Nios II

Design flows

1. Open the Nios II Software Build Tools for Eclipse



2.
S
elect the Nios II development fold in %
DUCAOCLSDKROOT
%
\
board
\
EthDE2

\
hardware
\
EthDE2
\
software


3.

Download the .elf file to FPGA.


Configure Successful, Connection is successful!


4. Test the connection.

Local computer IP address:


Remote FPGA IP addre
ss: 192.168.21.171




C/C++
Software

d
esign flows

1.

Open VS2010 and the project locates
%
DUCAOCLSDKROOT
%
\
designs
\
software_src
\
host
\

host.sln



2.

Design the
kernel_softare

and
black
_scholes_process to design the software algorithm. In this
version the algorit
hm is :

c=
K
d
N
S

)
1
(
0
)
2
(
d
N
e
rT


p=
0
0
)
1
(
)
2
(
S
K
c
d
N
S
d
N
Ke
rT
rT









d1
=
T
T
r
K
S


)
2
/
(
)
/
ln(
2
0



d2=
T
d
T
T
r
K
S







1
)
2
/
(
)
/
ln(
2
0

N(x) = 1
-
0
),
)(
(
5
5
4
4
3
3
2
2
1
'





x
k
a
k
a
k
a
k
a
k
a
x
N



1
-
N(
-
x),x<0

where

k=
,
2316419
.
0
,
1
1




x

1
a
=0.319381530,
2
a
=
-
0.356563782,
3
a
=1.781477917,
4
a
=
-
1.821255978,
5
a
=1.330274429


3.

Change the fold to %
DUCAOCLSDKROOT
%
\
designs
\
analysis_data
, Edit the

original_data.txt

which contains the origin
al test data. In this case:


SENDNUM=

---

Protocol Head

00000006


---

Process data number. For multi
-
input data

1.000000



---

Black Scholes itype data

22.50000



---

Spot Price $22.5

24.00000



---

Strike Price $24

0.067000



---

Risk
-
free interest 6.7%

3.000000



---

Time 3

0.080000



---

Volatility 8%


4 Run the software host.exe to start the software and hardware process.

H
ost.exe locates
in the fold
%
DUCAOCLSDKROOT
%
\
designs
\
software_src
\
host
\
Debug



5 Come back to the fold
%
DUCAOCLSDKROOT
%
\
designs
\
an
alysis_data

and use the
tool
process_update_for_x86.exe

to convert and update the result into
hardware_out.txt

and
software_out.txt
.



6 Analyze the hardware and software result to evaluate the performance of hardware algorithm.

In
this case, you can see

Software result:


Hardware result:



To use the Modelsim timing wave, we can easily to find where the error happens.
When I check
the timing, I notice the difference comes from two part
s
:

1.

The FPGA just uses the 32 float but 64 double.

2.

The software up ca
rry when the last number is bigger than 5 while hardware ignore them
directly
.

Hardware Black Scholes Algorithm structure

T
his structure is not the best one for two reasons

1.

its latency is too big. It spends about 1 us to process just one black scholes cal
culation.

2.

This module does not take advantage of the pipeline
performance

of FPGA. The next group
can not be processed as soon as the current group is finished.

Generally

if we spends time to reanalyze this structure, we can find many places to improve.

But it
is enough for demonstration of this hardware algorithm development platform


black_scholes_process.vhd

module
:

D
1
D
2
SpotPrice
StrikePrice
Rate
Timetm
Volatility
CNDF
1
CNDF
2
d
1
d
2
OptionPrice
N
(
d
1
)
N
(
d
2
)
OptionPrice



black_scholes_did2.vhd

module
:

/
6
Sqrt
16
X
5
Ln
21
X
5
X
5
+
7
+
7
-
7
spot
strike
timetm
rate
volatility
0
.
5
Ln
(
S
0
/
K
)
σ

T
(
r
+
σ
2
/
2
)
T
d
1
d
1
d
2
X
5
/
6
27
21
22


black_scholes_cndf.vhd

module:

+
7
Exp
17
+
7
+
7
+
7
+
7
-
7
InputX
γ
-
0
.
5
1
.
0
a
1
a
2
a
3
Inv
_
sqrt
_
2
xPI
a
4
a
5
OutputX
1
.
0
(
a
1
k
+
a
2
k
2
+
a
3
k
3
+
a
4
k
4
+
a
5
k
5
)
a
5
k
5
k
5
k
4
k
3
k
2
k
a
1
k
a
2
k
2
a
3
k
3
a
4
k
4
X
5
X
5
X
5
X
5
X
5
X
5
X
5
X
5
X
5
X
5
X
5
X
5
X
5
X
5
INV
20
32
57


black_scholes_optionprice.vhd

module


Future improvement

In th
e next version:

1. OpenCL standard between C/C++ software and FPGA platform.

2. Add more commands such as Nios II download command to download .elf file by command
line.

3. Add the function to change the target board easily.

4. Add the function to set the
probes which can extract the value of specified signals in hardware
simulation. This function is very helpful when we compare the output of both software and
hardware algorithm sub
-
module to fix the bugs

fastly
.