Towards the Use of Machine Li A l i t h t P d i t Learning Algorithms to Predict Human Immunodeficiency Virus Drug Resistance

wyomingbeancurdΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

108 εμφανίσεις

Towards the Use of Machine
LiAlithtPdit
L
earn
i
ng
Al
gor
ith
ms
t
o
P
re
di
c
t

Human Immunodeficiency Virus
Drug Resistance
Yashik Singh
Department of Tele-Health
Nelson R Mandela School of Medicine
singhy@ukzn.ac.za
Outline

Introduction

Techniques

ﱴ

ﱴ

Conclusions

Future Work
Introduction

“Several expert panels have recommended that
drugresistancetestingbeusedtohelpselect
drug

resistance

testing

be

used

to

help

select

the optimal drug therapy”
1
Dldtifttiftit

D
eve
l
ope
d
coun
t
r
i
es per
f
orm
t
es
ti
ng
f
or pa
ti
en
t
s
failing treatment and for those initiating therapy
forthefirsttime
for

the

first

time


Genotypic or phenotypic assays may be used
butgenotypiccurrentlymorewidelyused
but

genotypic

currently

more

widely

used

Jaideep Ravela,et al, 2003, Rapid Communication: HIV-1 Protease and Reverse Transcriptase
MttiPttRiblfDidBtGtiDRitIttti
M
u
t
a
ti
on
P
a
tt
erns
R
espons
ibl
e
f
or
Di
scor
d
ances
B
e
t
ween
G
eno
t
yp
i
c
D
rug
R
es
i
s
t
ance
I
n
t
erpre
t
a
ti
on
Algorithms, Journal of Acquired Immune Deficiency Syndromes Vol 33, No 1. 8 -14
Introduction

The goals of this project :

Implementation and comparison of computer-
based machine learning algorithms

analysis and interpretation of genetic sequence
data in patients failing anti-retroviral therapy.
CtAlith
SubtypeB
Measurement
C
ompu
t
er
Al
gor
ith
m
Subtype

B
-
genetic data
Measurement

of resistance
Machine Learning Algorithms

Support Vector Machines (SVM)

Gene Ex
p
ression Pro
g
rammin
g

(
GEP
)
pgg()

北ﱥ省拾說I

北ﱥ



ーﵩョ



Neural Network : Multi-Layer Perceptron (MLP)
Technique: SVM

Embeds data into
hih
a
hi
g
h
er
dimensional vector
space

Complex
e
q
uations and
q
optimisations
Technique: SVM

Embeds data into
hih
a
hi
g
h
er
dimensional vector
space

Complex
e
q
uations and
q
optimisations
Technique: GEP

Uses the principles of
evolutiontoconverge
evolution

to

converge

on an optimal solution

Automatically finds
mathematical
mathematical

expressions that best
modeltheproblem
model

the

problem
Technique: PSO

Optimisation
thibd
t
ec
h
n
i
que
b
ase
d
on
the choreography of
bidflki
bi
r
d

fl
oc
ki
ng

Knowledge of birds
are used to find
p
re
y
py
Technique: PSO

Optimisation
thibd
t
ec
h
n
i
que
b
ase
d
on
the choreography of
bidflki
bi
r
d

fl
oc
ki
ng

Knowledge of birds
are used to find
p
re
y
py
Technique: PSO

Optimisation
thibd
t
ec
h
n
i
que
b
ase
d
on
the choreography of
bidflki
bi
r
d

fl
oc
ki
ng

Knowledge of birds
are used to find
p
re
y
py
Technique: MLP (NN)

Neural network’s
fdtiibilt
f
oun
d
a
ti
on
i
s
b
u
ilt
on
the structure of the
neuron.

MLP has many
neurons structured in
layers.
Preliminary Results
Coeffiecient of correlation for regression
Classification Accuracy of each Technique
0.7
0.75
0.8
60
70
80
90
/ %
0.55
0.6
0.65
r
10
20
30
40
50
Accurac
y
0.55
SVMGEPPSOMLP
Technique
0
SVMPSOGEPMLP
Technique
Comparison
Significantly more
accurate than
As accurate as
Associative ClassificationGeno2Pheno
Committee Neural NetworkMLP
(
Ener
gy)
(gy)
Decision treeVG/BD Guidelines 6.0
DrSequan
AntiRetroScan
Dr

Sequan
AntiRetroScan
KNN
RBNN
RegaInst
Retrogram
Retrogram
Stanford HIV-db algorithm*
Conclusions

GEP and PSO are relatively new additions to the field of
classification and re
g
ression
g

The comparison of the GEP and PSO with rules-based
algorithms, such as the widely-usedthe Stanford HIV-
DBisnovelaccordingtotheauthor

scurrent
DB

is

novel

according

to

the

authors

current

understanding

Previous research mostly produced algorithms that
flifiti
per
f
orm c
l
ass
ifi
ca
ti
on

By implementing such algorithms one hopes to
understandnew,andconfirmexistingknowledgeinthis
understand

new,

and

confirm

existing

knowledge

in

this

domain
Future Work

Analyze Subtype C and to Correlate the
different mutation patterns in Subtype B
and C

whether subtype C resistance can be predicted
with an al
g
orithm trained on subt
y
pe B data
gy

120 subtype C sequences

86% accurac
y
and 0.74 correlation coefficient
y
Future Work

Correlate genotypic data with clinical, viral
and immunological outcomes

Im
p
ortant in develo
p
in
g
countries
ppg

Use adaptive boosting
Mltbtidfj

M
erge resu
lt
s o
bt
a
i
ne
d

f
rom ma
j
or
algorithms into one interpretation
A
cknowledgements

Prof. Hajek, Dr. Seebregts, Mr. Moodley,
Prof. Mars, Dr De Oliveria

BICSA and NRF grants
ThankYou
Thank

You
fti
f
or your
ti
me