Case Studies in Acceleration of Heston's stochastic ... - HPCFinance

pumpedlessΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 5 χρόνια και 1 μήνα)

214 εμφανίσεις

Case Studies in Acceleration of
Heston’s Stochastic Volatility
Financial Engineering Model:
GPU,Cloud and FPGA Implementations
by
Christos Delivorias
Supervised by
Dr.Peter Richtárik
and Martin Takáč
Dissertation presented for the Degree of
MSc in Operational Research
The School of Mathematics
August 2012
I want to thank first and foremost Mr.Erik Vynckier of Scottish Widows
Investment Partnership(SWIP) in Edinburgh,both for giving me the
opportunity to work on this project,as well as his support and guidance
throughout the project.He has been an invaluable mentor to me.
I also would like to thank my academic supervisor Dr.Peter Richtárik for his
help and also for believing in me and suggesting me for this project.I especially
want to thank Mr.Martin Takáč in his role as a co-supervisor.His experience
with aspects of GPU parallelisation have been invaluable,as has his
instantaneous availability to my questions.Much aspects of this thesis might
have been left uncompleted without his assistance and guidance.
I would also like to thank Messrs Robin Bruce and Steven Hutt at Maxeler
Technologies for their hospitality in London and their support with getting to
grips with Dataflow processing.On the same note,I’d like to acknowledge the
help of Messrs Tuomas Eerola and Rainer Wehkamp at Techila Technologies for
extending to myself and SWIP the use of their grid processing platform.
Acknowledgments also go out to Mr.John Holden for providing a trial license
for the Numerical Algorithms Group (NAG) libraries for Matlab,to use in this
project.Similarly,to Mr.John Ashley at NVIDIA Corp for his input regarding
the physical setup of the GPU server,to Mathworks for providing a trial license
for Matlab to run on the GPU server,and to Boston Limited for providing the
GPU test platform.
Finally,and most importantly for myself,I want to thank my girlfriend and
partner in life Eugenia Metzaki.My love,I would not have made it thus far
without your help,your support,and your faith in me.
Thank you.
Declaration
I declare that this thesis was composed by myself and that the work contained
therein is my own,except where explicitly stated otherwise in the text.
(Christos Delivorias)
Abstract
Here we present a comparative insight of the performance of the Heston stochastic
volatility model on different acceleration platforms.This model was tested against
a MacBook’s CPU,a Techila grid server hosted on Microsoft’s Azure cloud,a GPU
node hosted by Boston Ltd,and an FPGA node hosted by Maxeler Technologies
Ltd.Temporal data was collected and compared against the CPU baseline to
provide quantified acceleration benefits from all platforms.
Table of Contents
List of Tables v
List of Figures vi
Acronyms and Abbreviations viii
Chapter 1 Introduction 1
1.1 Motivation...............................1
1.2 Outline of the Thesis.........................2
Chapter 2 Option Pricing Theory 3
2.1 Arbitrage-free and complete market.................3
2.2 Risk-neutral valuation........................4
2.3 Feynman-Kac theorem........................4
2.4 Black & Scholes (B&S) model....................5
Chapter 3 Underlying Price Volatility 6
3.1 Stochastic Volatility.........................6
3.2 Heston’s Model............................8
3.2.1 Cox-Ingersoll-Ross Process..................9
3.2.2 Heston Partial Differential Equation (PDE)........11
3.2.3 Heston’s closed-form analytical solution...........13
3.2.4 Double Heston Model.....................14
3.3 Bates’ Model.............................15
3.3.1 Affine Jump Diffusion Functions...............15
Chapter 4 Monte Carlo Simulation 17
4.1 Introduction..............................17
4.2 Simple Monte Carlo (MC) algorithm................18
4.3 Variance reduction..........................19
4.3.1 Control variate technique...................19
4.3.2 Antithetic variates technique.................21
ii
4.3.3 Quasi-random simulation...................22
4.4 Discretisation Schemes for Stochastic Differential Equations...23
4.4.1 Euler-Maruyama scheme...................23
4.4.2 Milstein scheme........................24
4.4.3 Kahl-Jäckel scheme......................25
4.4.4 Broadie-Kaya exact calculation scheme...........25
4.4.5 Quadratic Exponential (QE) scheme............26
4.5 Generating random realisations of variates.............27
4.5.1 General considerations....................27
4.5.2 Sampling methods......................29
4.5.3 Uniform random samples...................29
4.5.4 Normal random samples...................30
Chapter 5 Dataflow programming on Field Programmable Gate
Arrays (FPGAs) 35
5.1 Applications of FPGAs in computational finance..........35
5.2 FPGA versus Intel multi-core....................36
5.3 Scope for FPGAs...........................37
5.4 Application to the Heston model..................38
Chapter 6 Implementation on Graphical Processor Units (GPUs) 40
6.1 Using GPUs to solve large parallel problems............40
6.2 MATLAB implementation of the Heston model for the General
Purpose computing on GPU (GPGPU)...............42
Chapter 7 Implementation on the Techila cloud 45
7.1 The Techila cloud platform......................45
7.2 Description of platform implementation...............47
Chapter 8 Efficiency and Accuracy 48
8.1 Performance of Monte Carlo Simulation...............48
8.2 Accuracy of Simulation........................50
8.2.1 Bias and Variance of Results.................50
8.2.2 Standard Error........................51
Chapter 9 Results and conclusions 53
9.1 Acceleration results..........................53
9.2 Research Outcomes..........................55
9.3 Future Work..............................55
iii
Appendix A Code i
iv
List of Tables
6.1 GPGPU specifications for the dual Kepler cards...........41
6.2 GPGPU specifications from NVIDIA’s website...........42
9.1 Acceleration results for all platforms................54
v
List of Figures
3.1 Prices of 3 indices from 1990 to 2012................7
3.2 GARCH volatility of 3 indices from 1990 to 2012.........8
3.3 A plotting of the Cox-Ingersoll-Ross process............10
4.1 Monte Carlo (MC) simulation procedure...............32
4.2 Sobol sequence for quasi-random variates..............33
4.3 platforms sequence for quasi-random variates............33
4.4 Faure sequence for quasi-random variates..............34
5.1 Central Processing Unit (CPU) Architecture............37
5.2 The FPGA architecture.......................38
5.3 Code use with the Maxeler Technologies..............39
6.1 Memory space allocation on a GPU.................41
6.2 Schematic of the multiple cores in the NVIDIA Fermi GPU....44
7.1 Parallel problem architecture.....................45
7.2 Embarrassingly parallel problem architecture............46
7.3 Techila Architecture.........................47
8.1 Matlab figures for the Monte Carlo simulation of the Heston model 49
8.2 Call option prices with multiple models...............51
8.3 Monte Carlo standard error.....................52
9.1 Comparisson of accelerations.....................54
vi
List of Algorithms
1 MC Simulation(N).........................18
2 Monte Carlo Simulation with control variate(t;N)..21
3 Monte Carlo Simulation with antithetic variates(N).22
4 Quadratic Exponential Variance Reduction.......28
vii
Acronyms and Abbreviations
ALU Arithmetic Logic Unit................................................37
AJD Affine Jump Diffusion.................................................11
ATM At the Money........................................................50
BLAST Basic Local Alignment Search Tool................................46
B&S Black & Scholes.......................................................5
CIR Cox-Ingersoll-Ross......................................................9
CPU Central Processing Unit...............................................35
KS Kolmogorov-Smirnov....................................................29
CUDA Compute Unified Device Architecture...............................42
CI Confidence Interval......................................................48
DoF Degrees of Freedom.....................................................6
GPU Graphical Processing Unit.............................................2
GPGPU General Purpose computing on GPU..............................40
HW Half-Width............................................................18
HDL Hardware Description Language......................................35
HFT High Frequancy Trading...............................................1
i.i.d.independent and identically distributed................................16
LCG Linear Congruential Generator........................................29
viii
MC Monte Carlo...........................................................17
MCMC Markov Chain Monte Carlo........................................50
MT Mersenne Twister......................................................30
NAG Numerical Algorithms Group.........................................23
FPGA Field Programmable Gate Array.....................................2
GARCH Generalized Auto-Regressive Conditional Heteroskedasticity........6
GBM Geometric Brownian Motion..........................................4
GPU Graphical Processor Unit..............................................2
OS Operating System.......................................................36
PCT Parallel Computing Toolbox..........................................53
PCIe Peripheral Component Interconnect express...........................36
PDE Partial Differential Equation...........................................2
PDF Probability Density Function..........................................29
PRNG Pseudo-Random Number Generator................................28
QE Quadratic Exponential..................................................26
RV Random Variable........................................................5
SDE Stochastic Differential Equation........................................2
TRNG True-Random Number Generator...................................28
ix
Chapter 1
Introduction
The purpose of this thesis is to explore different ways to accelerate a certain
financial mode,specifically the Heston stochastic volatility mode.The platforms
to be examined are the cutting edge at the present time in parallelisation of
algorithms.For myself this was a completely novel and unchartered territory.
Coming from a background in Computer Science and taken a Risk management
approach with the MSc in Operational Research,this project was combining
aspects of financial mathematics that challenged me to quickly come to grips
with.It has been a steep learning curve,but I’ve thoroughly enjoyed mastering
this new domain of knowledge.
1.1 Motivation
For a long time now,even before the 2008 financial crisis,the derivatives sector
was in need of a fast and efficient way to calculate the prices of options given
certain market data.The use of this is two fold;first the ability to price the option
value on a given underlying is important in a fast-paced market environment.
The second use is in a relatively new sector of the market that deals with High
Frequancy Trading (HFT).This sector is relying on extremely fast computations
in order to make algorithmic decisions based on the current market status.This
thesis will focus on the pricing of options rather than the aspects of HFT.The
informed reader would to look into some of the negative aspects of this practice
as explained by Easley et al.[ELO].
This joint project between the University of Edinburgh and Scottish Widows
Investment Partership aims to explore the possibilities in accelerating financial
engineering models,with real life applications in the markets.The goal of this
thesis was to test the same model on different platforms and assess the benefits
of accelerations that each platform provided.
1
1.2 Outline of the Thesis
Chapter 2 introduces the pricing theory behind the derivatives options and states
some of the short-comings of the previous models.This chapter links the Stochastic
Differential Equations (SDEs) with the Partial Differential Equations (PDEs).
The introduction of stochastic volatility to combat the issues raised in Chapter 2
is addressed in Chapter 3.Chapter 4 explains the theory and the implementation
behind evaluating an SDE using a Monte Carlo simulation.Chapters 5,6,and 7
go into more details on the acceleration platforms of Field Programmable Gate
Arrays (FPGAs),Graphical Processor Units (GPUs),and the Techila Cloud,re-
spectively.Chapter 8 details the efficiency and the accuracy of the implementation
of the Heston model in Matlab,and finally Chapter 9 presents the experimental
results and concludes the thesis.
2
Chapter 2
Option Pricing Theory
This chapter will touch upon the fundamentals of arbitrage theory in pricing the
fair value of an option.An option provides the bearer with the opportunity,but
not the obligation,to sell(put) or buy(call) the underlying at fixed price(strike).
If the option is not exercised nothing happens,aside from the premium paid for
the option is lost.There are currently two ways to achieve this pricing.One is via
the option’s characteristic PDE and the other with the Monte Carlo simulation.
The PDE option will be presented in Section 2.4 and will be used to ana-
lytically evaluate the price of the Heston model in Section 3.2.The risk-neutral
approach in Section 2.2 will be used to numerically simulate the price of the
underlying with the Monte Carlo simulation,as presented in Chapter 4.
2.1 Arbitrage-free and complete market
An arbitrage is the practice of identifying and taking advantage henceforth of
mis-priced assets that are trading in multiple exchanges.In effect this is taking
advantage of someone selling an asset for a lower price than another party is
willing to buy it for;and vice-versa.By acting in-between the price spread,the
arbitrageurs can make a profit.The act of arbitrage itself changes the underlying’s
value and thus pushes it towards an equilibrium.An arbitrage-free market is one
where such imbalances do not exist.This idea is crucial when trying to evaluate
the risk-neutral evaluation of an option.
Given a portfolio with M underlying assets,excluding the risk-free asset,and
R the number of sources of randomness,it is possible to define a"meta-theorem"
to determine if the portfolio represents a complete and arbitrage-free market.As
stated in [Bj9,p.122],we can define the following rule of thumb.
Theorem 2.1.Let M denote the number of underlying traded assets in the model
excluding the risk free asset,and let R denote the number of random sources.
Generically we then have the following relations:
3
1.The model is arbitrage free if and only if M  R.
2.The model is complete if and only if M  R.
3.The model is complete and arbitrage free if and only if M = R.
Remark.The practical implication of a complete market is that options are head-
geable and can therefore be priced.On the other hand an incomplete market is
one where derivative securities cannot be perfectly hedged,and the price of an
option cannot be derived by the prices of its underlying assets.
For the purposes of this thesis we will assume a frictionless market,with no
taxes or transaction costs and premiums.For simplicity of the modelling we will
also assume a non-dividend paying underlying.We will proceed to examine the
relationship between two assets;the option and the underlying.
2.2 Risk-neutral valuation
The idea of a risk neutral valuation states that a derivative’s price is calculated
by the expecation of the discounted price at maturity,under a risk neutral mea-
sure.What this implies is that the the discounted option processes are indeed
Martingales under the risk neutral measure in a no arbitrage market.In this case
the value of the derivative is the expected value at maturity of the payoff and the
expected return of the underlying asset equals the risk-free rate.
2.3 Feynman-Kac theorem
The Feynman-Kac theorem is fundamental to financial modelling as it links the
PDE to the SDE domain.What follows is a brief presentation of the theorem.
The theorem is explained in detail in [Kle05].
Theorem 2.2.Let x
t
follow the SDE
dx
t
= (x
t
;t)dt +(x
t
;t)dW
Q
t
;(2.3.1)
where W
Q
t
is a Geometric Brownian Motion (GBM) under the probablity measure
Q.Now let f(x
t
;t) be a differentiable function of x
t
and t that follows the PDE,
@f
@t
+(x
t
;t)
@f
@x
+
1
2
(x
t
;t)
2
@
2
f
@x
2
r(x
t
;t)f(x
t
;t) = 0;
on the condition at the boundary of f(X
T
;T).Under the provision of the theorem,
f has the solution,
f(x
t
;t) = E
Q
h
e

R
T
t
r(x
u
;u)du
f(X
T
;T)jF
i
:(2.3.2)
4
2.4 Black & Scholes (B&S) model
The B&S model formulates the assumption that the price of an asset is dependent
on a GBM.
dS
t
= S
t
dt +S
t
dW
t
;(2.4.1)
where the drift  and the volatility  are assumed to be constant.This means
that the asset prices are log-normally distributed.This however is on the antipode
fromrecent empirical observations which dictate the volatility to be non-Gaussian
and in fact stochastic.
The price C
T
for a vanilla call option according to B&S is
C
T
= e
rT
E

(S
0
e
(r
1
2

2
)T+
p
TN
(0;1)
K)
+

;(2.4.2)
where r the risk-free interest rate,T the time to maturity,K is the strike price,
S
0
the spot price at t = 0, is the volatility,and N
(0;1)
is a Gaussian Random
Variable (RV),in this example the payoff f of a call option
f = (S
0
e
(r
1
2

2
)T+
p
TN
(0;1)
K)
+
:
5
Chapter 3
Underlying Price Volatility
For many years the B&S model was the de facto method for pricing derivative
securities.The purpose of the B&S model was to create a risk-neutral asset that
could be perfectly hedged against market volatility.In their seminal paper Fis-
cher Black and Myron Scholes [BS73] described the partial differential equation
that governs the price of a derivative over time.The key insight of this approach
was to hedge the security by buying or selling the underlying,at a certain ratio
 which was calculated by the B&S equation.This kind of strategy was also
referred as dynamic delta hedging.
However recent criticism [HT09] has highlighted the weaknesses and limita-
tions of the model.One of the most prominent remarks is the incapacity of the
Gaussian distribution to properly address the probability of events in the tails of
the distribution.Suggestions are being made to use the Student’s t-distribution
which is a fat-tailed distribution for low Degrees of Freedoms (DoFs).
3.1 Stochastic Volatility
As is suggested in [Kwo08] there is an apparent clustering of volatility with regards
to an assets value.The clustering is characterised by a mean-reverting behaviour
for the volatility.The assumption made by B&S was that of a constant volatility.
However empirical evidence has shown that the volatility of an asset price is not
described very well by the Normal distribution,when in fact the volatility follows
a leptokurtic distribution
1
.Figure 3.1 shows the prices of 3 indices (FTSE100,
S&P500,and NIKKEI225) from 1990 until today.Figure 3.2 shows the volatility
of each index modelled with GARCH(1,1)
2
since 1990.It is apparent that the
1
Essentially the distribution has higher peaks around the mean and fatter tails,than that
of a Gaussian distribution.
2
The Generalized Auto-Regressive Conditional Heteroskedasticity (GARCH) model aims to
more accurately describe the phenomenon of volatility clustering and related effects such as
6
31−Dec−1989
05−Jul−1997
07−Jan−2005
12−Jul−2012
0
5000
10000
Time (date)
Adjusted Close price ($)


FTSE100
31−Dec−1989
05−Jul−1997
07−Jan−2005
12−Jul−2012
0
500
1000
1500
2000
Time (date)
Adjusted Close price ($)


SP500
03−Jan−1990
07−Jul−1997
08−Jan−2005
12−Jul−2012
0
20000
40000
0
Time (date)
Adjusted Close price ($)


NiKKEI225
Figure 3.1
The FTSE100,S&P500,and NIKKEI225 index values from 1990 to 2012.The
Matlab code for these plots can be found in Listing A.3.
volatility is not normally distributed.
Since volatility drives the asset price,it became apparent that a more realistic
model was required to represent volatility.Heston [Hes93] showed that the B&S
model could be extended,by taking into account a stochastically moving volatility.
kyrtosis.Volatility clustering is the effect that large changes in prices tend to be followed by
large changes,of either sign,and vice versa.
7
31−Dec−1989
05−Jul−1997
07−Jan−2005
12−Jul−2012
−0.1
−0.05
0
0.05
0.1


03−Jan−1990
07−Jul−1997
08−Jan−2005
12−Jul−2012
−0.1
−0.05
0
0.05
0.1


31−Dec−1989
05−Jul−1997
07−Jan−2005
12−Jul−2012
−0.1
−0.05
0
0.05
0.1


FTSE100 daily returns
GARCH(1,1) conditional SD
NIKKEI225 daily returns
GARCH(1,1) conditional SD
SP500 daily returns
GARCH(1,1) conditional SD
Figure 3.2
The correspoding conditional variance (in red) calculated with GARCH(1,1),
overlayed on the daily returns of each index.The Matlab code for these plots can
be found in Listing A.3.
3.2 Heston’s Model
The Heston model extends the B&S model by taking into account a stochastic
volatility,that is mean-reverting and is also correlated with the asset price.It
assumes that both the asset price and its variance are determined by a joint
stochastic process.
Definition 3.1.The Heston model that defines the price of the underlying S and
its stochastic volatility v is defined as
dS
t
= S
t
dt +
p
v
t
S
t
dW
S
t
;
dv
t
= ( v
t
)dt +
p
v
t
dW
v
t
;
CovhdW
S
t
;dW
v
t
i = dt;
(3.2.1)
8
where the two GBMs(W
S
;W
v
) are correlated by a coefficient , is the rate of
reversion of the variance to  – the long-term variance –, is the volatility of
variance,and  is the rate of return of the asset.
The Heston model is a superior choice to the B&S model since it provides a
model that has a dynamic stochastic volatility,as described in Equation (3.2.1).
This model has a semi-analytic formula that can be exploited to derive an inte-
gral solution which is described in §3.2.2.Additionally if the [Fel51] condition is
upheld,this process will only produce strictly positive variance with probability
1;this is described in Lemma 3.2.
Lemma 3.2.If the parameters of the Heston model obey the condition
2  
2
;
then the stochastic process will produce volatility such that Pr (v
t
> 0) = 1,since
the upward drift is large enough to strongly reflect away from the origin.
3.2.1 Cox-Ingersoll-Ross Process
Cox et al.[CIR85] described a Markov process with continuous paths defined by
the SDE
dv
t
= ( v
t
)dt +
p
v
t
dW
v
t
;(3.2.2)
where  is the equilibrium supported by the fundamentals, is the rate at which
the shocks dissipate and the variance returns to , is the degree of volatility
around it caused by shocks,and dW
t
is a GBM.Figure 3.3 shows the plot of a
Cox-Ingersoll-Ross (CIR) process.
This process is mean-reverting,ergodic
3
and can be thought of as an Ornstein-
Uhlenbeck
4
process [see [Gil96] for more on implementation,and the Matlab code
in Listing A.2].
The probability density function of the variance at time s in the future,con-
ditional on its value at the current time,t  s,as described in [CIR85,p.391]
is
f (v
s
;s;v
t
;t) = ce
u


u

q
2
I
q
(2(u)
1
2
);
3
An ergodic process is one which its statistical properties,e.g.mean and variance,can be
deduced from a single,sufficiently long realisation of the process.
4
Describes the velocity of a Brownian particle under friction.It’s stationary,Gaussian,
Markovian,and tends to drift towards its long-term mean.
9
0
50
100
150
200
250
−2
0
2
4
6
Euler Discritisation, sampling from N
(0,1)
for 1 year(s)
Time(business days)
S(t)
0
50
100
150
200
250
−2
0
2
4
6
Euler Discritisation, sampling from the terminal non−central χ
2
distribution for 1 year(s)
Time(business days)
S(t)
Figure 3.3
Two CIR processes for 5 periods,with different distributions,where dt = 1,mean
reversion rate  = 3:0,long term mean  = 1,and a volatility of  = 0:6.The
process starts at S(0)=5 reverting to a long term mean of 1 (the mean is marked
in the figure with a horizontal black dashed line).Note how the top process
breaches the lower barrier of 0,and becomes negative,while the second one is
highly reflective at 0.
where
c =
2

2
(1 e
(st)
;
u = cv
t
e
(st)
;
 = cv
s
;
q =
2

2
1;
and I
q
() is the modified Bessel function of the first kind of order q.This con-
ditional distribution function is a non-central 
2
;e:g:v
t
 
2
(2q + 2;2u),with
2q +2 =
4

2
DoF and non-centrality factor 2u = 2cv
t
.The terminal distribution
10
as s!1;is distributed as a gamma distribution,i.e.f  (
2

2
;

2
2
)
5
.The
CIR process itself is a special case of the basic affine diffusion function which are
described in §3.3.1.
3.2.2 Heston PDE
The process followed by the variance,as shown in Equation (3.2.1),is a CIR pro-
cess.These kinds of processes,as presented in §3.2.1,are a subset of Affine Jump
Diffusion (AJD) processes,in that they present no jumps.The following deriva-
tion of the Heston model’s PDE is a special case of PDE for general stochastic
volatility models as described in [Gat06].Given the Heston model as we defined
it in §3.2 there are two random sources – the price and variance Wiener processes
– and only one traded asset,the underlying option (since the volatility is not a
traded asset
6
),then according to [Bj9,p.118] this model is arbitrage-free but
incomplete.In order to make it complete we need to add another traded asset
with associated stochastic risk to it,so that the portfolio can be hedged.
Theorem 3.3.Let a portfolio (t) consisting an option V (S;v;t;T) with matu-
rity T, units of the underlying stock S,and 
1
units of another hedging option
U(S;v;t;T) with maturity T.The PDE is then described by the equation,
0 =
@V
@t
+
1
2
vS
2
@
2
V
@S
2
+vS
@
2
V
@S@v
+
1
2

2
v
@
2
V
@v
2
rV +rS
@V
@S
+[( v) (S;v;t)]
@U
@v
;
where all the variables are as were defined in x3.2,and  is the market price of
volatility risk.
Proof.Let (t) be a self-financing
7
hedged portfolio,consisting of an option on
an underlying asset S,the asset S itself,and another option U,
(t) = S +V (S;v;t;T) +
1
U(S;v;t;T);(3.2.3)
where t 2 [0;T],and v is the volatility as defined in the Heston model.We can
drop the dependent parameters for reason of brevity and look at the change in
the value of the portfolio during a quantum of time dt is
5
A Gamma distribution (;),is defined by two numerical parameters:,the shape pa-
rameter –which dictates the overall shape of the distribution–,and  the scale parameter,which
defines by how much the distribution is stretched/shrinked.
6
However recent instruments have emerged that treat Volatility as an asset class [see??]
7
i.e.there is no inwards/outwards cash flow.
11
d = dS +dV +
1
dU:(3.2.4)
From Itô’s lemma we have that the changes of the price of the options V and U
are,
dV =

@V
@t
+
1
2
vS
2
@
2
V
@S
2
+Sv
@
2
V
@S@v
+
1
2

2
v
@
2
V
@v
2

dt+
@V
@S
dS+
@V
@v
dv;(3.2.5)
dU =

@U
@t
+
1
2
vS
2
@
2
U
@S
2
+Sv
@
2
U
@S@v
+
1
2

2
v
@
2
U
@v
2

dt +
@U
@S
dS+
@U
@v
dv:(3.2.6)
We replace Equations 3.2.5 and 3.2.6 into 3.2.4 to get,
d = dS +dv + dt;(3.2.7)
where,
 =
@V
@S
++
1
@U
@S
;
 =
@V
@v
+
1
@U
@v
;
=
@V
@t
+
1
2
vS
2
@
2
V
@S
2
+vS
@
2
V
@S@v
+
1
2

2
v
@
2
V
@v
2
+
1

@U
@t
+
1
2
vS
2
@
2
UV
@S
2
+Sv
@
2
U
@S@v
+
1
2

2
v
@
2
V
@v
2

;
and where  and  above are as defined in Equation (3.2.1).
In order to provide a risk-neutral valuation it is necessary to eliminate the the
stochastic components of Equation (3.2.7).Hence we set  =  = 0 and derive
the following hedging parameters

1
= 
@V
@v
@U
@v
;
 = 
@V
@S

1
@U
@S
:
(3.2.9)
If there is no arbitrage to exist,then that the rate of return on the portfolio’s
value will be equal to the risk-free interest rate r - which we will assume is
deterministic for this thesis - leaving us with
d = dt = rdt = r(V S 
1
U)dt:(3.2.10)
Equation (3.2.10) yields that,
12
= r;
and hence,
r(S +V +
1
U) =

@V
@t
+
1
2
vS
2
@
2
V
@S
2
+vS
@
2
V
@S@v
+
1
2

2
v
@
2
V
@v
2

+
1

@U
@t
+
1
2
vS
2
@
2
UV
@S
2
+Sv
@
2
U
@S@v
+
1
2

2
v
@
2
V
@v
2

;
which we can abstract by writing
A+
1
B = r(S +V +
1
U):
When we substitute ;
1
from Equations 3.2.9 and re-arrange we get,
ArV +rS
@V
@S
@V
@v
=
B rU rS
@U
@S
@U
@v
:(3.2.11)
The LHS of Equation 3.2.11 is a function of just V,as is the RHS a function of
U alone.Hence there must be some function f(S;v;t) for which the equality of
Equation 3.2.11 holds.When we define this function as f(S;v;t) = ( v) +
(S;v;t),where  is the market price of volatility risk,following the proposal by
[Hes93].Thus Equation 3.2.11 can be written as,
ArV +rS
@V
@S
@V
@v
= ( v) +(S;v;t):
By re-arrangement and substitution of A we get,
0 =
@V
@t
+
1
2
vS
2
@
2
V
@S
2
+vS
@
2
V
@S@v
+
1
2

2
v
@
2
V
@v
2
rV +rS
@V
@S
+[( v) (S;v;t)]
@U
@v
:
3.2.3 Heston’s closed-form analytical solution
The Heston model’s closed form price C – as described in [Hes93] and defined
in Definition 3.1 – for a European vanilla call option,with strike price K,spot
price S,and time to maturity T,on a non-dividend underlying satisfies Equation
(3.2.12).
C(S;v;t) = SP
1
Ke
r(tT)
P
2
;(3.2.12)
13
where the change in its price for a time d,conditional on the strike price K,is
P
|
(x;v;T;ln[K]) =
1
2
+
1

Z
1
0
R

e
iln[K]
f
|
(x;v;t;)
i

d;
x = ln[S
t
];
f
|
(x;v;t;) = e
C
|
(Tt;)+D
|
(Tt;)v+ix
;
C
|
(;) = ri +


2

(b
|
i +d
|
) 2 ln

1 g
|
e
d
|

1 g
|

;
D
|
(;) =
b
|
i +d
|

2

1 e
d
|

1 g
|
e
d
|


;
and
g
|
=
b
|
i +d
|
b
|
i d
|
;
d
|
=
q
(i b
|
)
2

2
(2u
|
i 
2
);
where | = f1;2g, = T t, is the integration parameter,i =
p
1, is the
price of volatility risk
8
,
u
1
=
1
2
;u
2
= 
1
2
;b
1
=  + ;b
2
=  +;
and the rest of the parameters are as defined in Definition 3.1.The Octave/Mat-
lab code to implement Equation (3.2.12) is shown in Listing A.1 in Appendix A.
3.2.4 Double Heston Model
Another approach to the Heston model is to add a second process of variance
[GP09],independent of the first one,
dS
t
= S
t
dt +
2
X
{=1
S
t
p
v
{
t
dW
S
{
t
;(3.2.13a)
dv
{
t
= 
{
(
{
v
{
t
)dt +
{
p
v
{
t
dW
v
{
t
;{ = f1;2g;(3.2.13b)
0
B
B
B
B
B
@
d
W
S
1
W
v
1
W
S
2
W
v
2
W
S
1
1 
1
0 0
W
v
1

1
1 0 0
W
S
2
0 0 1 
2
W
v
2
0 0 
2
1
1
C
C
C
C
C
A
:(3.2.13c)
8
The choice of (S;v;t) can be assumed as (S;v;t) = Cov[dv;
dC
C
],where C(t) is the
consumpion rate and is the relative-risk aversion of an investor [see [Bre79]].
14
This approach provides a more flexible model by increasing the DoF in the
stochastic process.[CH09] showed that the introduction of a second,uncorrelated,
variance process to the Heston model provided a more flexible modelling for the
time variation in the smirk
9
,leading to a better fit quality with comparable
computational time.Equation (3.2.13c) shows the correlation structure between
the GBMs.
3.3 Bates’ Model
Critical information regarding a company’s performance arrive at random points
in time and have an acute impact on the company’s stock price.Because the
information arrives at a randomtime,it can be treated and modelled as a random
variable.To account for this behaviour Merton [Mer76] introduced a model to
account for the discontinuous paths of asset prices.The behaviour of this model
is very similar to the AJD functions described in §3.3.1.
The Merton and Heston model were combined by Bates [Bat96] who merged
the stochastic volatility with the jump diffusion augmentation.The stock price
follows the SDE,
dS
t
= S
t
dt +
p
v
t
S
t
dW
S
t
+(e
+
1)Sdq
t
;
dv
t
= ( v
t
)dt +
p
v
t
dW
v
t
;
CovhdW
S
t
;dW
v
t
i = dt;
dq =
(
0 with probability 1 dt
1 with probability dt
;
where dq is a compounded Poisson process with a constant intensity ,and the
jump size is lognormally distributed with mean log-jump  and standard deviation
,where   N
(0;1)
.The jump size distribution in not correlated with the Wiener
processes.
3.3.1 Affine Jump Diffusion Functions
The basic AJD functions are of the following form,
dx(t) = (x;t)dt +(x;t)dW(t) +dZ(t;;);
(x;t) = 
0
+
1
x(t);

2
(x;t) = H
0
+H
1
x(t);
9
A smirk is defined as a skewed volatility smile,which in its turn is the long-observed pattern
at which at-the-money options thend to have lower implied volatilities than in/out-of-money
options.
15
where (x;t) and 
2
(x;t) are affine functions,dW is a GBM and dZ(t) is a
compound Poisson process with  intensity and  independent jumps,which is
independent of W.The log-jump sizes Y
i
 N
(;
2
)
are independent and identi-
cally distributed (i.i.d.) random variables with mean  and variance 
2
which are
uncorrelated with both Z(t) and W(t).Jumps add mass to the returns distri-
bution,and thus address the issue of non-"fat-tail"that the B&S models suffers
from.By increasing  we can add mass to both tail,while a negative/positive 
add mass to the left/right tail respectively alone.
More recent explorations into jump diffusions can be seen in [Lip02],where
the log-exponential jumps are investigated to price exotic options.
16
Chapter 4
Monte Carlo Simulation
As mentioned in Chapter 2,the numerical method to evaluate the value of a PDE
is a Monte Carlo (MC) simulation.In this chapter we will set the basics of this
method and define the inputs and the intermittent processes required for this
simulation to work.
4.1 Introduction
The reference to Monte Carlo,is due to the homonymcity’s affiliation with games
of luck.The premise of luck is utilised within the simulation in order to provide
a random sample from the overall probability space.If the random sample is as
truly random as possible,then the subset of the random samples is taken as an
estimate of the volume of the entire probability space.The law of large numbers
guarantees [RW] that this estimation will converge to the true likelihood as the
number of random draws increases.Given a certain number of random draws,
the likely magnitude of the error can be derived by the central limit theorem.
As stated in §2.3 the Feynman-Kač theorem is the connective tissue between
the PDE form of the stochastic model,and its approximation by a Monte Carlo
simulation.By this theorem it is possible to approximate certain form PDEs,by
simulating random paths and deriving the expectation of them as the solution of
the original PDE.
As an example we can to return the B&S Equation (2.4.2),where the price
of the option depends on the expected value of the payoff.In order to calculate
the expected value of f it is possible to run a MC simulation with N paths
1
,
in order to approximate the actual price of the call option C with the simulated
1
A path in this context is a discrete time-series of prices for the option,one for each discrete
quantum of time.
17
price
^
C
N
,
^
C
N
=
e
rT
N
N
X
{=1

S
0
e
(r
1
2

2
)T+
p
TN
{
(0;1)
K

+
;(4.1.1)
where r is the risk-free interest rate,T is the time to maturity of the option,
is the volatility,K is the strike price at maturity date T,S
0
is the spot price at
t = 0,and N
(0;1)
are Gaussian RVs.By the"strong law of large numbers"we
have,
Pr

^
C
N
!C

= 1,as N!1:(4.1.2)
4.2 Simple MC algorithm
Suppose we wish to estimate the call option price C:= E[h(X)],then the algo-
rithm to simulate the expected value is,
Algorithm 1 MC Simulation(N)
Require:Generate X
1
;  ;X
N
Ensure:E[C] =
^
C
N
1:for i = 1 to N do
2:Y
i
:= h(X
i
)
3:S
i
S
i1
+Y
i
4:end for
5:return
^
C
N

S
N
N
In order to determine the efficiency of the simulation,we need to calculate the
confidence interval for this simulation.The approximate 100(1 )% confidence
interval for Algorithm 1 is defined as,

^
C
N
z
1

2
^
N
p
N
;
^
C
N
+z
1

2
^
N
p
N

;
where ^
N
is the estimate of V ar(Y ) based on Y
1
;  ;Y
N
.We can use the Half-
Width (HW) of the confidence interval to measure the quality of the estimator,
^
C
N
,
HW= z
1

2
r
V ar(Y )
N
;
(4.2.1)
The smaller the HWthe more accurate the simulation of the expected value will
be.There are two ways that this can be achieved.Either increase N or reduce
V ar(Y ).The first option has some limitations upon computational complexity
of calculating Y
i
in a reasonable amount of time.We can however impact the
variance of the generated Y
i
values.There are multiple techniques to do the
latter and these are referred to as variance reduction.
18
4.3 Variance reduction
There are two major avenues to take in order to reduce variance in a MC simula-
tion,one is to take advantage of specific features of the problem domain to adjust
or correct simulation outputs,the other by directly reducing the variability of
the simulation inputs.In this section we’ll introduce control variates,antithetic
variance,absorbing technique,full truncation,and the case of the quasi-random
simulation.
4.3.1 Control variate technique
In order to improve on the accuracy of the estimated variate,it is essential to
use as much information as possible.We can derive very important information
for the estimated variate,if we can define another variate,one of which it is
not difficult to calculate it’s expected value,that is highly correlated with the
estimated one.We could then use this new variate as a control mechanism in
order to improve the accuracy of the simulation by minimising the errors between
the two variates.
Suppose that,as in Section 4.2,we wish to estimate the call option price
C:= E[Y ],where Y:= h(X) is for instance the discounted payoff for a call
option for which we have no closed-form evaluation.We do however know of
another variate Z,which we can easily generate,whose expected value E[Z] we
can also quickly calculate.Now suppose that for each replication of Y
i
we also
generate a value for Z
i
,and that the pair (Y
i
;Z
i
) is i.i.d.,then for any fixed c we
can calculate,
Y
c
i
= Y
i
+c(Z
i
E[Z]);
from the i
th
path and thus calculate the sample mean,
^
Y
c
=
^
Y +c(
^
Z E[Z]) =
1
N
N
X
i=1
(Y
i
+c(Z
i
E[Z])):
This is an unbiased estimator of E[Y ]
2
that can act as a control variate estimator,
that is the observed residual Z E[Z] acts as a control when estimating E[Y ].
If we now compute the variance of
^
Y
c
we get,
V ar
h
^
Y
c
i
= V ar [Y
i
+c(Z
i
E[Z])]
= 
2
Y
+2c
Z

Y

ZY
+c
2

2
Z
 
2
c
;
(4.3.1)
where 
2
Z
= V ar [Z];
2
Y
= V ar [Y ],and 
ZY
is the correlation factor between Z
and Y.In order to minimise the variance of the replication we choose a c

that
2
Since E[Y
c
i
] = E[Y
i
+c(Z
i
E[Z])] = E
h
^
Y
i
= E[Y ].
19
minimises Equation 4.3.1 and is expressed as,
c

= 

Y

Z

ZY
= 
Cov[Z;Y ]
V ar[Z]
:(4.3.2)
Substituting Equation 4.3.2 into 4.3.1 we get,
V ar
h
^
Y
c
i
= V ar[
^
Y ] 2
Cov[Z;Y ]
V ar[Z]
Cov[Z;Y ] +
Cov[Z;Y ]
2
V ar[Z]
2
V ar[Z]
= V ar[
^
Y ] 
Cov[Z;Y ]
2
V ar[Z]
;
(4.3.3)
therefore,as long as Cov[Z;Y ] 6= 0 we can achieve a reduction in variance of
the estimated variate.In fact the higher the correlation,
ZY
,is the higher the
reduction to the variance will be.
All that remains now is to find an appropriate control variate that is non-zero
correlated to the estimated variate.Since we are trying to find the discounted
payoff of a call option,we could use the Black & Scholes SDE as the Z control
variate of the above example.
Remark.If we were to examine the ratio of the controlled estimator to that of
the uncontrolled estimator we could derive that,
V ar
h
^
Y +c(
^
Z E[Z])
i
V ar[
^
Y ]
= 1 
2
ZY
;
(4.3.4)
where what is implied is that the stronger the correlation between the estimated
Y and control Z variate,the more effective the control variate is.N.B.that this
effect is irrelevant of the sign of correlation since this is canceled in the square
form in Equation 4.3.4.
The question remains however as to how to estimate this correlation of the
two bi-variates.This can be achieved by doing t training simulation runs in which
the correlation coefficient is calculated,first by calculating the covariance of the
two variates as,
\
Cov[Y;Z] =
P
t
|=1

Y
|

^
Y
t

(Z
|
E[Z]))
t 1
:
And by selecting an appropriate control variable,we are able to calculate the
expected value as well as the sample variance of the replications.For the variance
we get,
\
V ar[Z] =
P
t
|=1
(Z
|
E[Z])
2
t 1
;
20
and hence we can derive fromEquation 4.3.2 the optimal constant
^
c

to use for the
control variate simulation.Once we have this constant we can use it to perform
a simulation and reduce the variance with a control variate.
A simple algorithm to simulate the variate V with a control variate Z would
look like Algorithm 2.
Algorithm 2 Monte Carlo Simulation with control variate(t;N)
Require:t > 1;N > 0.
Ensure:E[Y ] =
^
V
N
.
1:for i = 1 to t do
2:generate (Y
i
;Z
i
) {training run to calculate the constant factor c*}
3:end for
4:^c

calculate c

5:for i = 1 to N do
6:generate (Y
i
;Z
i
) {actual simulation}
7:V
i
Y
i
+^c

(Z
i
E[Z])
8:end for
9:
^
V
N

P
N
i=1
V
i
N
10:
\
V ar[V ]
N

P
N
i=1
(V
i

^
V
N
)
2
N1
11:100(1 )%CI

^
V
N
z
1

2
\
V ar[V ]
N
p
N
;
^
V
N
+z
1

2
\
V ar[V ]
N
p
N

4.3.2 Antithetic variates technique
This method reduces variance by introducing a negative dependence between pairs
of replications.We will present the case where the replications are sampled from
a Uniform distribution,however this method can take various forms.As [Gla04,
p.205] mention,this method can be extended to various distributions via the
inverse transform method where F
1
(U) and F
1
(1 U) both have distribution
F,but are antithetic to each other because F
1
is monotonic.As an example it
is possible to use a pair or replications from the normal distribution,by pairing a
sequence Z
1
;Z
2
;   of i.i.d.N
(0;1)
random variables with the antithetic sequence
Z
1
;Z
2
;   of i.i.d.N
(0;1)
random variables.
Let us now extend the paradigm we presented in Chapter 4.3.1 for the price
of a call option.In this case we generate two pairs of antithetic replications from
a Uniform distribution and use them as the unbiased estimator,
Z
{
=
Y
i
+
~
Y
i
2
;
(4.3.5)
where Y
i
= h(U
i
) is the payoff of the call option sampled on a Uniformdistribution
U
i
 U
(0;1)
,and
~
Y
i
= h(U
i
) is the payoff of the call option sampled on an antithetic
21
Uniform distribution 1 U
i
 U
(0;1)
.
Since E[Y ] = E[
~
Y ] =
^
Y,then from Equation (4.3.5) we deduce that Z
i
is
an unbiased estimator of
^
Y,and because the U
0
i
s are i.i.d.we can use Z
i
to
construct confidence intervals.Algorithm 3 explains how the MC simulation
would implement antithetic variates.
Algorithm 3 Monte Carlo Simulation with antithetic variates(N)
Require:N > 0.
Ensure:E[Y ] =
^
V
N
.
1:for i = 1 to N do
2:generate U
i
3:Y
i
h(U
i
)
4:
~
Y
i
h(1 U
i
)
5:Z
i

Y
i
+
~
Y
i
2
6:end for
7:
^
Z
N

P
N
i=1
Z
i
N
8:
\
V ar[Z]
N

P
N
i=1
(Z
i

^
Z
N
)
2
N1
9:100(1 )%CI

^
V
N
z
1

2
\
V ar[V ]
N
p
N
;
^
V
N
+z
1

2
\
V ar[V ]
N
p
N

4.3.3 Quasi-random simulation
One more procedure to reduce the variance of the simulation is to sample for
variates of lower variance.Such numbers can be sampled from the so called"low
discrepancy"sequences.A sequence’s discrepancy is a measure of its uniformity
and is defined by following definition [see [Lev02]].
Definition 4.1.Given a set of points x
1
;x
2
;  ;x
N
2 I
S
and a subset G'I
S
,
define the counting function S
N
(G) as the number of points x
i
2 G.For each
x = (x
1
;x
2
;  ;x
S
) 2 I
S
,let G
x
be the rectangular s-dimensional region,
G
x
= [0;x
1
) [0;x
2
)    [0;x
S
);
with volume x
1
;x
2
;  ;x
N
.Then the discrepancy of the points x
1
;x
2
;  ;x
N
is
given by,
D

N
(x
1
;x
2
;  ;x
N
) = sup
x2I
S
jS
N
(G
x
) N  (x
1
 x
2
;  ;x
S
)j:
The discrepancy value of the distribution compares the sample points found
in the volume of a multi-dimensional space,against the points that should be in
that volume provided it was a uniform distribution.
22
There are a few sequences that are being used to generate quasi-random vari-
ates.The Numerical Algorithms Group (NAG) libraries provide three sequence
generators.The Niedereiter [Nid92],the Sobol [Sob67],and the Faure [FAU81]
sequence are implemented in MATLAB with the functions g05yl and g05ym.
4.4 Discretisation Schemes for Stochastic Differ-
ential Equations
In §4.3 we described methods of reducing the variance of the Monte Carlo simula-
tion and thus increasing the precision of the estimated value at the end.However,
there is one more factor of error that needs to be taken into account and addressed,
and that’s the simulation bias due to the discretisation of the SDE.One way to
think of this is with by shooting arrows at a bullseye target;high precision shots
form a tight cluster of arrows,but they could be completely outside of the circles
due to high bias.We will continue to present the Euler Scheme,which is the
simplest and most common form of discretisation,before we proceed to refine
and extend it to alternative schemes.
4.4.1 Euler-Maruyama scheme
Let us consider the case of the following SDE,
dX(t) = (X(t))dt +(X(t))dW(t):(4.4.1)
Also let
^
X be the discretised approximation of X.The Euler Maruymama [Mar55]
approximation,and temporal granulation 0 = t
0
< t
1
<    < t
m
,and
^
X is,
^
X(t
i+1
) =
^
X(t
i
) +(
^
X(t
i
))t +(
^
X(t
i
))
p
tZ
i+1
;(4.4.2)
for i = 0;  ;m1,Z
i
i.i.d.normal variates,and t = t
i+1
t
i
.
Since the discretisation is an approximation process,it is imperative to mea-
sure howaccurate this approximation ultimately is.To do this we need to evaluate
the discrepancy between the SDE and its discrete approximation conditional on
the size of t.In essence we want to evaluate if the error
 =
h
jjX(t) 
^
X(t)jj
i
!0;
23
when t!0.There are two accepted metrics to measure this discrepancy;the
weak convergence that shows the error of the mean,and the strong convergence
which shows the mean of the error.
A typical weak convergence error has the form,

weak
t
:= sup
0t
i
T


E[f(X(t))] E
h
f(
^
X(t))
i

;
where f is a smoothing polynomial of some order k.We say that the discretised
approximation
^
X(t) converges weakly if 
weak
t
!0 when t!0.The order of
the weak convergence is > 0 when

weak
t
 Ct

;
for some scalar C and for all sufficiently small t.
Conversely the discretised approximation
^
X(t) converges strongly if for the
strong error

strong
t
:= sup
0t
i
T


E[X(t)] E
h
^
X(t)
i

;
we have that 
string
t
!0 when t!0.Similarly as above the order of the strong
convergence is > 0 when

strong
t
 Ct

;
for some scalar C and for all sufficiently small t.According to [Gla04,p.345],
the Euler scheme typically has a strong order of
1
2
,but often achieves a weak
order of 1.
4.4.2 Milstein scheme
This scheme was first proposed by Milstein [Mil95],and is explained in detail
by Glasserman [Gla04,p.340-344],Klöden and Platen[KPS94] for more general
processes,and in Kahl and Jäckel [KJ06] for stochastic volatility processes.The
scheme works for SDEs for which the drift and diffusion terms are not dependent
on time directly.Let us take the case of a stochastic process as defined in Equation
(4.4.1).The Milstein discretisation scheme can then be expressed as
^
X(i +1) =
^
X(i) +(
^
X(i))t +(
^
X(i))
p
tZ
i+1
+
1
2

0
(
^
X(i))(
^
X(i))t(Z
2
i+1
1):
(4.4.3)
The discretisation Equation 4.4.3 is composed by a deterministic part which
is defined by the  term,a stochastic term that is defined by the  term,and an
Itô’s term.
24
4.4.3 Kahl-Jäckel scheme
Kahl and Jäckel [KJ06,p.24] propose an implicit Milstein scheme for the variance
in combination with an alternative discretisation for the underlying’s price pro-
cess.Specifically they refer to this stochastic volatility scheme as IJK and define
it as
^
V (t +t) =
^
V (t) +t +
q
^
V (t)Z
V
p
t +
1
4

2
t(Z
2
V
1)
1 +t
;
(4.4.4)
ln
^
X(t +t) = ln
^
X(t) 
t
4

^
V (t +t) +
^
V (t)

+
q
^
V (t)Z
V
p
t
+
1
2

q
^
V (t +t) +
q
^
V (t)


Z
X
p
t Z
V
p
t

+
1
4
t(Z
2
V
1);
(4.4.5)
where  is the equilibrium supported by the fundamentals, is the rate at which
the shocks dissipate and the variance returns to , is the degree of volatility
around it caused by shocks,and Z
X
;Z
V
are Normal variates.
By finding the minimum of the variance function and forcing its value to
be positive we can easily derive that the variance is guaranteed to be strictly
positive if 4 > 
2
.In reality,as Andersen [And07] mentions,it is unrealistic to
uphold this constraint in realistic situations,hence this scheme will can and will
produce negative variance.Andersen proposes then a full truncation to 0 when
a value is negative.This means the Equation (4.4.4) will substitute
^
V (   )
+
=
max(
^
V (   );0) where there is
^
V (   ).
4.4.4 Broadie-Kaya exact calculation scheme
Broadie and Kaya [BK06] presented a discretisation process that is completely
bias-free.This exact scheme though has limitions
3
and sub-optimal performance
even against the"simple"Euler scheme as Lord et al.[LKvD06] showed in their
numerical comparisons
4
.
To obtain the bias-free scheme begin with the consecutive application of Itô’s
Lemma first to get the explicit form and then to pass on to a Cholesky decom-
position (see [And07,p.7] for a detailed derivation).What is finally obtained
is,
3
Due to lack of speed and high complexity of implementation.
4
Most likely due to the reliance to the acceptance-rejection sampling that induces a perfor-
mance penalty.
25
V (t +t) = V (t) +
Z
t+t
t
( V (u))du
+
Z
t+t
t
p
V (u)dW
V
(u);
(4.4.6)
lnX(t +t) = lnX(t) +


(V (t +t) V (t) t)
+




1
2
Z
t+t
t
V (u)du
+
p
1 
2
Z
t+t
t
p
V (u)dW(u):
(4.4.7)
The distribution of lnX(t+t) is clearly Normal,and after sampling V (t+t)
from a non-central 
2
,with degrees of freedom defined from a Poisson sampling,
we can draw a sample
R
t+t
t
V (u)dujV (t +t),and calculate the next log value
of X.Since this last sampled distribution is conditional on the next value of the
variance,we can identify it as a Brownian Bridge.
4.4.5 Quadratic Exponential (QE) scheme
In 2005 Andersen [And07] proposed a new scheme to discretise the stochastic
volatility and the price of an underlying asset.This scheme takes advantage of
the fact that a non-central 
2
sampled variate can be approximated by a related
distribution,that’s moment-matched to the conditional first and second moments
of the non-central 
2
distribution.
As Andersen points out,the cubic transformation of the Normal RV is a
more accurate representation of the distribution closer to 0,it introduces negative
values of variance.Thus the quadratic representation is adopted with a spacial
case for when we have low values of V (t).Therefore when V (t) is sufficiently
large,we get,
^
V (t +t) = a(b +Z
V
)
2
;(4.4.8)
where Z
V
is an N
(0;1)
Gaussian RV,and a,b scalars that will be determined by
moment-matching.Nowfor the complementary lowvalues of V (t) the distribution
can –asymptotically– be approximated by,
P(
^
V (t +t) 2 [x;x +t])  (p(0) +(1 p)e
x
)dx;x  0;
(4.4.9)
where  is the,strongly reflective at 0,Dirac delta-function,and p and  are
positive scalars to be calculated.The scalars a;b;p; depend on the parameters
of the Heston model and the time granulation t,and will be calculated by
moment-matching the exact distribution.
26
To sample from these distributions there are two distributions to take into
account:
 Sample from the normal N
(0;1)
Gaussian RV and calculate
^
V (t +t) from
Equation (4.4.8).
 To sample for the small values of V the inverse of Equation (4.4.9) will be
used.The inverse of the distribution function is,

1
(u) =
1
(u;p;) =
(
0 if 0  u  p;

1
ln

1p
1u

if p  u  1:
(4.4.10)
The value of V can then be sampled from
^
V (t +t) =
1
(U
V
;p;);
(4.4.11)
where U
V
is a uniform RV.
The rule on deciding which descritisation of V to use is depended on the non-
centrality of the distribution,and can be triaged based on the value of .The
value of is,
:=
s
2
m
2
=
^
V (t)
2
e
t

(1 e
t
) +

2
2
(1 e
t
)
2
( +(
^
V (t) )e
t
)
2
;
(4.4.12)
where m;s
2
are the conditional mean and variance of the exact distribution we
are matching.What Andersen showed was that the quadratic scheme of Equation
4.4.8 can only be moment-matched for  2 and similarly the exponential scheme
of Equation 4.4.11 can only be moment-matched for  1.It emerges then
that there is an overlap interval for 2 [1;2] where the two schemes overlap.
Intuitively Andersen chooses the midpoint of this interval as the cut-off point
between the schemes;thus the cut-off
c
= 1:5.
Since we’ve defined the discretisation process for the QE scheme,with Equa-
tions 4.4.8 and 4.4.11,and the cut-off discriminator,what is left is to calculate
the remaining parameters a;b;p; for each case.The algorithm for this process
is detailed in Algorithm 4.
This algorithm is implemented in MATLAB [see Listing A.4 for details],and
is used for numerical comparisons of acceleration.
4.5 Generating random realisations of variates
4.5.1 General considerations
A pseudo-random number sequence is one that tries to approximate a truly ran-
dom process by using a process that relies on a specific initial condition,called a
27
Algorithm 4 Quadratic Exponential Variance Reduction
Require:The present value for the variance,
^
V (t)
Ensure:The value for the variance in the subsequent time-step,
^
V (t +t)
1:Compute m  +(
^
V (t) )e
t
2:Compute s
2

^
V (t)
2
e
t

(1 e
t
) +

2
2
(1 e
t
)
2
3:Compute
m
2
s
2
4:if 
c
then
5:Compute a
m
1+b
2
6:Compute b 2
1
1 +
p
2
1
p
2
1
1
7:Generate Normal random variate Z
V
8:return
^
V (t +t) a(b +Z
V
)
2
9:else
10:Compute p
1
+1
2 [0;1)
11:Compute 
1p
m
12:Generate Uniform random variate U
V
13:if 0  U
V
 p then
14:return
^
V (t +t) 0
15:else
16:return
^
V (t +t) 
1
ln

1p
1U
V

17:end if
18:end if
seed,and a deterministic algorithm for generating continuous variates.Since the
process is deterministic the generated sequence cannot be truly random,however
it will not fail all tests of randomness.It is not possible to create a Pseudo-
Random Number Generator (PRNG) that passes all empirical statistical testing
(see [Knu97] for more on this matter),however we can accept empirically that
good PRNGs will pass all simple tests,while probably fail more complex ones.
Another property of the PRNGs is that the generated sequences are periodic,
meaning that after a certain amount of variates generated,the same variate as the
very first one will be generated again.However since the deterministic generation
relies on a specific mathematical function to generate the random variate,when
implemented on a CPU it is significantly more efficient in generating very large se-
quences,in comparison to generating truly random numbers using True-Random
Number Generators (TRNGs)
5
.
There are two qualities that can qualify a sequence’s randomness;its uni-
formity and its independence.The most common tests of uniformity are the
5
The simplest example of a truly random number would be a sequence of numbers recorded
by throwing a fair die for a large enough amount times.More optimised generators might use
natural phenomena or chaotic sources,that once observed can produce true randomness(e.g.
lava lamps[http://www.lavarnd.org],atmospheric white noise,CCD chip white noise,etc.)
28
Kolmogorov-Smirnov (KS) test,and the 
2
test.The autocorrelation test would
test for independence,by identifying periodicity in the sequences.
There are quite a number of algorithms to generate pseudo-random numbers.
Some of the most commonly used are mentioned in the chapters below.
4.5.2 Sampling methods
4.5.2.1 Inversion method
The Inverse transform sampling relies on the following theorem,
Theorem 4.2.Let X be a random variate from a continuous distribution with
a strictly increasing cumulative F(x).Let U be a uniform(0;1) random variate,
then F
1
(U)  F(x).
Recall that for y 2 [0;1];F(y) = P(U  y) = y.Then,
P(F
1
(U)  x) = P(U  F(x)) = F(x) = P(X  x):
Then the inversion process is
1.Generate U  U(0;1).
2.Output X = F
1
(U).
4.5.2.2 Acceptance-rejection method
For a random variate sequence X with a Probability Density Function (PDF)
f(x) and support (a;b),i.e.f(x) = 0;when x =2 [a;b].The function also has an
upper bound K for which f(x)  K;8x.The acceptance-rejection process is
1.Generate uniform random variates U
1
 U
(a;b)
and U
2
 U
(0;K)
.
2.If U
2
 f(U
1
),then output X = U
1
,else goto step 1.
4.5.3 Uniform random samples
The continuous uniform distribution is part of a family of distributions,where all
bins(intervals) of the same interval,are equally probable.What follows is a pre-
sentation of three procedures to sample from a Uniform distribution;the Linear
Congruential Generator (LCG),the Mersenne twister,and L’Ecuyer’s methods.
29
4.5.3.1 Linear Congruential Generator (LCG)
The LCG method is the oldest and well known methods to generate Uniform
random numbers.This is a recursive algorithm that takes advantage of some
properties of prime numbers.Like most PRNG it too relies on an initial value
to seed it.This becomes the starting value of the pseudo-random sequence.The
next value is deterministically derived from
Z
i+1
= (Z
i
+c) (mod m);
where m is the modulus, is the multiplier,and c is the increment.Since the
next variate in the sequence is modulated by m,it is not possible to get a number
higher than m.Thus we can derive uniform numbers U
i
2 [0;1] by U
i
=
Z
i
m
.
4.5.3.2 Mersenne twister
The Mersenne Twister (MT) is an exceptionally high quality,fast randomnumber
generator.It was developped by Makoto Matsumoto and Takuji Nishimura in
1996 and further extended in 2002;the current implementation is MT19937,with
32-bit word length.This algorithmhas a period length of approximately 2
19;937
1
and has been shown to be uniformly distributed in 623 dimensions.(see [MN98]).
The function g05kg(genid,subid) from the NAG library was used to generate the
uniform from 0 t 1 random variates.When the genid=3 the MT algorithm is
used to generate them.
4.5.4 Normal random samples
The normal or gaussian distribution has a charactestic bell-like same which gives
it the empirical name of the bell curve.It is considered the most prominent
distribution statistics,and arises in a multitude on natural phenomena.The
Gaussian function that descibes the bell curve is,
f(x;;
2
) =
1

p
2
e

1
2
(
x

)
2
;
where  = 0 is the expected value of the distrbution,and  = 1 is the variance
of it.
The most common and efficient way to generate normal variates N
0;1
is the
Box-Muller transformation.The method relies on first sampling two Uniform RV
and then performing a polar transformation with them.
Lemma 4.3.Let U
1
;U
2
to independent uniform random variates that are uni-
formly distributed in the interval (0;1).We can then generate two Normal random
variates,
30
Z
0
=
p
2 lnU
1
cos(2U
2
);
Z
1
=
p
2 lnU
1
sin(2U
2
);
then Z
0
;Z
1
are independent random variates,distributed normally with  = 1.
In the case of this implementation the NAG library was used to generate the
random variates.An example of how to generate them is given in Listing 4.1.
1 f unct i on [ x,i seedOut,i f a i l ] = GenerateNormalNAG(mu,var,n,gen)
n = i nt 64 (n);
3 i gen = i nt64 ( gen);
[ igenOut,i s eed ] = g05kc ( i gen );
5 [ i gen,i s eed ] = g05kb( i gen,i s ee d );
[ x,i seedOut,i f a i l ] = g05l a (mu,var,n,i gen,i s ee d );
Listing 4.1
Generate a 1 n vector of N
0;1
random variates.
31
Static MC
Uniform Generator
Generated r.v U
Transformation Rules
Simulated r.v.X
Simulation al-
gorithm (loop)
Variance
Reduction
Sample paths
Law of large numbers
Monte Carlo
Estimation
Dynamic MC
Arbitrage-free model
Risk-neutral SDE
Discretization
Scheme
Approxim.
random walk
Series
Expansion
Approxim.
random path
Figure 4.1
Monte Carlo (MC) simulation procedure.
32
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1


0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1


Sobol Quasi Random
Pseudo Random
Figure 4.2
The Sobol quasi-random uniform variates on the left,versus the pseudo-random
variates on the right.The NAG library was used for the quasi-random variates.
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1


0
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1


Niedereitter Quasi Random
Pseudo Random
Figure 4.3
The Niedereitter quasi-random uniform variates on the left,versus the pseudo-
random variates on the right.The NAG library was used for the quasi-random
variates.
33
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1


0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1


Faure Quasi Random
Pseudo Random
Figure 4.4
The Faure quasi-random uniform variates on the left,versus the pseudo-random
variates on the right.The NAG library was used for the quasi-random variates.
34
Chapter 5
Dataflow programming on FPGAs
This approach to accelerate code,takes advantage of the fact that most of the
time the Central Processing Unit (CPU) is busy figuring out the scheduling of the
instructions and the branch prediction of the program.The purpose of an FPGA
is to provide a customisable “field-programmable” chip that can be optimised to
perform calculation for a specific problem domain.This is achieved by allowing
the logic blocks on the chip to be re-wirable.This way,even after a board has
been shipped,it can be re-wired and re-purposed.
This re-wiring is achieved via a Hardware Description Language (HDL).The
language then offers the ability to interconnect the logic blocks into different
combinations and cater for complex combinatorial functions,and also manage
the on-chip memory
1
.
5.1 Applications of FPGAs in computational fi-
nance
One company that has made breakthroughs in accelerating financial models on
FPGAs is Maxeler Technologies.Their paradigm shift,from the Von Neumann
control-flow architecture to the Dataflow architecture,allows for much higher
computational specialisation and acceleration.The control-flow architecture can
be likened to a mechanics workshop,where one person does all stages of con-
structing a product,e.g.a motorcycle.This work doesn’t have to be sequential
in any order.While building the motorcycle,the mechanic can also side-step and
work on a part of a car before returning to the motorcycle according to a sched-
ule.The antipode of this paradigm is a motorcycle production line,where each
station on the production is optimised to perform one action.The overall process
is as quick as the flow.What the FPGAs provide is a way to create workers
1
This is mostly in the form of either flip-flops,or more structured blocks of memory.
35
–kernels as defined by Maxeler– that are highly specialised and extremely quick
at conducting a specialised operation.The kernels are large synchronous dataflow
pipelines that implement the mathematics and the control of the problem.They
are asynchronously coupled to other kernels and I/O sources and sinks (DRAM,
Peripheral Component Interconnect express (PCIe),inter-chip links,etc.) by the
manager.
An additional benefit of this technology is its low power footprint when com-
pared against a standard CPU.Since the clock cycle on a CPU is much higher
than in an FPGA the electrical resistance on the transistors causes the release of
energy in the form of heat.This heat accumulates in a server room and needs
to be dissipated.In large server clusters,the cooling of the servers can amount
to a significant cost.Typically the main power consumption for a cluster would
be half and half for the servers themselves,and for the cooling systems of those
servers.FPGA chips tend to offer a significant reduction in the electricity costs
of maintenance.
5.2 FPGA versus Intel multi-core
Thus far,CPU developments have adhered to Moore’s law
2
.This prediction has
been followed up to this point,however limitations of the scale that transistors
can achieve coupled with the issue of power consumption increasing the more
transistors are fitted in a chip,casts doubt into the relevance of Moore’s law.
However,what might actually happen instead is that the transistors will double
their numbers every 18 months,but mainly because the number of cores in each
chip would double.What this means is that Operating Systems (OSs) will be
able to take advantage on multiple cores within a CPUand via efficient scheduling
maximise performance while minimising power consumption.
The benefits of such of the Intel multi-core approach is that the current pro-
gramming paradigm can abide and most existing code could be easily –compared
to more exotic implementation on GPUs and FPGAs – ported to the many-core
architecture much quicker.
On the one hand the FPGA can leverage two advantages over the CPU ap-
proach.First it has more silicon dedicated to calculations compared to the CPU.
And second it relies on the DataFlow architecture to do away with the taxing as-
pects of instruction scheduling and branch-predictions.This way the calculations
pipeline is always full and a result is calculated every clock cycle [see Figure 5.2 for
2
Moore stated in 1965 that the transistor density of semiconductor chips would double
roughly every 18 months.
36
Figure 5.1
The process architecture on a CPU where the ALU is referred as the Function
Unit.Data has to be moved into the Funtion Unit form memory and then moved
back into memory for storage.(Photo used by permission of Maxeler Technolo-
gies).
more details].On the other hand the CPU,as shown in Figure 5.1,needs to han-
dle concurrent threads vying for their turn on the Arithmetic Logic Unit (ALU)
in order to progress their calculation status.
5.3 Scope for FPGAs
The FPGA lends itself more aptly to problems of a difference engine nature.For
instance it has been successfully used in the Seismic Acquisition Industry to per-
formfinite difference modelling of the geophysical models,to performreverse time
migrations,and to do CRS stackings.Lately implementations in Credit Deriva-
tives Pricing have been appearing as well [WSRM12].Basically any mathematical
process that can be decomposed to distinguishable self-sufficient sequential cal-
culation can achieve high acceleration on the FPGA architecture.
37
Figure 5.2
The FPGA architecture as implemented by Maxeler Technologies.The Max-
Compiler constructs the DataFlow tree which defines the circuit architecture on
the FPGA chip.From then on data from memory gets piped into the different
DataFlow cores until it exits the calculation pipe and is committed to memory.
(Photo used by permission of Maxeler Technologies).
5.4 Application to the Heston model
The implementation of Heston’s stochastic volatility model has two aspects to
it.First the code that is run on the host,and second the code that defines the
circuit architecture of the FPGA and performs the necessary calculations.
Since only repetitive calculations can benefit from the DataFlow architecture
there are certain elements that need to run on the host and others on the FPGA
card.Maxeler Technologies use the nomenclature of a kernel and a manager.
The kernel comprises of a set of calculations that produce a distinct result,e.g.
a 3-value moving average.The manager’s responsibility is to instantiate and ad-
minister the life cycle and functions of each kernel that is assigned to it.For this
implementation the manager creates numerous pipes within a given MaxCard
3
to
3
Latest models of Maxeler’s FPGA cards provide an ever increasing number of resources
on-board the chip.
38
Figure 5.3
This figure illustrates how code interacts between the host CPU and the FPGA
kernels.(Photo used by permission of Maxeler Technologies)
handle different operations.The more pipes that can be filled into the available
silicon the better the overall performance of the FPGA.The manager is responsi-
ble to create and to populate the pipes with kernels to generate random variates
from the Gamma distribution,and also kernels that calculate the next values for
the variance and the price of the underlying.Once all the prices of the underlying
have been generated for every timestep,the results are aggregated back on the
host’s CPU.
39
Chapter 6
Implementation on GPUs
6.1 Using GPUs to solve large parallel problems
The GPU is a highly specialised parallel processor for accelerating graphical com-
putations.The first GPU card that consolidated the entire graphics processing
pipeline off-chip for 3D and 2D acceleration was the NVIDIA GeForce256,which
was released in 1999 by NVIDIA.The main function of the card was to render a
graphical image on the GPU chip to turn a model into an image.The strength of
this chip was its specialised computational capacity for floating-point arithmetic.
The card itself has an onboard memory whose different access layer are used
internally by the different threads running on the GPUblocks.It is the developers
duty to orchestrate the migration of state from the host CPU memory to the
worker GPU memory,create the work kernels in the blocks,aggregate the results
in the worker memory,and finally return the result to the host CPU memory (see
Figure 6.1).
The initial use for this acceleration card was for computer video games,whose
high visualisation demands spearheaded the graphic cards industry into ever im-
proving GPU cards.The result of this was that their performance in computation
of parallel calculation started to outperform high end CPUs because of the spe-
cialisation they applied.This was the time that clients with increased processing
needs started to take notice.Academic Institutions,and companies in the Oil &
Gas sector,as well as others,began to notice that code could be parallelised on a
GPU instead of relying on grids of CPUs.This need gave rise to the niche market
of General Purpose computing on GPUs (GPGPUs) for dealing with embarrass-
ingly parallel problems in certain domains.Table 6.2 details the most current
GPGPU cards available on the market and their specifications.
40
Figure 6.1
Memory space architecture on a GPU.
Source “NVIDIA CUDA Programming Guide” version 1.1.
Features
Tesla K10
Number and Type of GPU
2 Kepler GK104s
Peak double precision floating point[FP] performance
190 Gigaflops
Peak single precision floating point[FP] performance
4577 Gigaflops
Memory bandwidth (ECC off)
320 GB/sec
Memory size (GDDR5)
8GB
CUDA cores
3072
Table 6.1
GPGPU specifications for the dual Kepler cards.
41
Features
Tesla M2090 Tesla M2075 Tesla M2070-Q
Number and Type of GPU
1 Fermi GPU 1 Fermi GPU 1 Fermi GPU
Peak double precision FP
665 Gigaflops 515 Gigaflops 515 Gigaflops
Peak single precision FP
1331 Gigaflops 1030 Gigaflops 1030 Gigaflops
Memory bandwidth (ECC off)
177 GBytes/sec 150 GBytes/sec 150 GBytes/sec
Memory size (GDDR5)
6 GigaBytes 6 GigaBytes 6 GigaBytes
CUDA cores
512 448 448
Table 6.2
GPGPU specifications from NVIDIA’s website.
http://www.nvidia.com/object/tesla-servers.html
There has been a flurry of different problems implemented with GPUs in differ-
ent disciplines.Problems in chemical engineering [ZVS11],iterative algorithms in
linear systems [ESV10],finance [RS09],and bioinformatics [VS11].These imple-
mentations can be rapidly prototyped in MATLAB,using the Parallel Computing
Toolbox,and then deployed on heterogeneous platforms including GPUs.There
is also the option to develop highly specialised code in C/C++ and the Compute
Unified Device Architecture (CUDA) toolkit in order to increase the performance
of the bundled executables or to deploy API libraries with the modelled function-
ality.
The development friction with the latter approach,arises when developing
CUDA code,which is cumbersome and rigid.Niche knowledge of the underlying
platform is essential for the development of the tools.However recently there
have been attempts to bridge the gap between existing code,fully tested and
understood,with the parallelisation offered by GPUs.One of these approaches
is a JAVA to GPU compiler.The serialisation of the JAVA state (e.g.JAVA
objects),the kernel code generation and launch,and the deserialisation of state
back into CPU memory is all done automatically.The developer only writes the
model logic in JAVA and the compiler automates the rest of the processes (see
[PSFW12] for more information).
6.2 MATLABimplementation of the Heston model
for the GPGPU
Using the existing Heston model as shown in Listing A.4 it is possible to augment
and differentiate this model to be able to take advantage of Matlab’s Parallel
Computing Toolbox to deploy the existing model on multiple GPUs.To do this
42
there are several ways to take advantage of a single GPU card,multiple cards per
node,and finally multiple nodes in a grid.
The easiest formof access to the GPU is via the overrided functions within the
toolboxes acted on a specific class of the gpuArray.This method will effectively
transfer data over the PCIe card onto the GPU memory.Methods of transforma-
tion,e.g.addition,subtraction,multiplication,etc.,would then be applied on the
GPU and not on the CPU.The calculated data is then collected from the GPU
into the CPU memory via the gather method.This high-level implementation
is useful when quick matrix calculation without significant need for minute con-
trol is required.However when the problemdimensionality increases,fine-grained
control is required in order to increase the performance of the acceleration.
For this reason it is important to distinguish between independent calculations
in the model and take advantage of multiple GPU cards in order to accelerate
the calculations.The Monte Carlo simulation lends itself quite fittingly because
the individual path calculations are independent of each other,hence can be
executed on multiple GPU cards.This process is achieved by executing a parfor
loop,whose constituent loops are distributed on multiple GPUs and even across
multiple nodes.
In order to access multiple GPU cards it is necessary to bind them to specific
Matlab workers.This is achieved via the spmd...end structure.The following
Listing 6.1 shows this in more detail.
N=1000;
2 A = gpuArray( magic (N) );
4 spmd
gpuDevice ( l abi ndex );
6 end
8 par f or i x =1:N % Di s t r i but e f or l oop to workers
% Do the GPU cal c
10 X = myGPUFunction( i x,A);
% Gather data
12 Xtotal ( i x,:) = gather (X);
end
Listing 6.1
’Example of multiple GPU assignment to distribute a Matlab function to multiple
workers.’
The actual implementation of the present Heston model for the GPU platform
can be viewed in Listing A.5 in Appendix A.Martin Takáč offered significant
implementation guidance and help on the use of the arrayfun function.
43
Figure 6.2
Schematic of the multiple cores in the NVIDIA Fermi GPU.
44
Chapter 7
Implementation on the Techila
cloud
7.1 The Techila cloud platform
The Techila company in Finland deals in the domain of distributed computational
problems.The latter are characterised by two kinds of problems.The first are
traditional parallel problems where the individual distributed processes are run
on different hosts,but require some information to be passed between them.
Problems like fluid dynamics or finite element models are examples of parallel
problems.
Figure 7.1
Parallel problem.Each computational job accepts input data and uses this input
data to compute results.Communication between jobs is required in order for
the individual jobs to be completed.(Photo used by permission of Techila).
45
On the other hand are the embarrassingly parallel problems.These kinds
of problems do not require communication between the distributed calculations,
since each calculation is independent from all others.Monte Carlo simulations,
computer graphics rendering,genetic algorithms,brute force methods in cryptog-
raphy,Basic Local Alignment Search Tool (BLAST) searches in bioinformatics,
and Machine Learning analytics [DEW
+
12],etc..
Figure 7.2
Embarrassingly parallel problem.Each process accepts input data and uses this
input data to compute results.Communication is limited to receiving input data
and transferring output data;no inter-process communication takes place.(Photo
used by permission of Techila).
Techila has produced a computational platform that deals with heterogeneous
distributed computing of embarrassingly parallel problems.The benefit of such
an approach is the ability to utilise multiple hardware and operating systems
in order to leverage all available computing power for problem solving.The
computational resources could be a plethora of instances,fromlaptops,to unused
machines in close proximity,to local servers,and even on-demand instances from
cloud providers.In other words Techila ports and manages the code on a hybrid
set of computational resources.
Each computational resource is assigned a worker instance that performs the
independent computation.All the workers are administered by a central man-
ager which interacts with the end user to collect the problem state.As shown
46
Figure 7.3
Techila system three-tiered structure.The End-User interacts with the Techila
Server,which distributes computations to the Workers.All network communi-
cation in the Techila environment will be secured by using SSL (Secure Socket
Layer),meaning all transferred will be in encrypted form.(Photo used by permis-
sion of Techila).
in Figure 7.3 the End User communicates solely with the server,whose job in
turn is to segment the problem in its distributed parts and transfer them to the
heterogeneous constituency of computational resources.
7.2 Description of platform implementation
For the present case of the Heston model the cloud implementation was achieved