Accelerating Data Movement To Support Lattice QCD Computations

moneygascityInternet and Web Development

Dec 8, 2013 (3 years and 10 months ago)

202 views

PERFORMANCE
Below are results from one Lattice researcher’s use of Globus Online over the past few months:
0RYHG*%ÀOHVLQPLQXWHV

²,PSURYHGWUDQVIHUUDWHIURP0EVHFWR0EVHF

²6DPHWUDQVIHUZRXOGKDYHWDNHQRYHUGD\VZLWKVFS
0RYHG*%LQPLQXWHV
0RYHGa7%LQKRXUV
0RYHG*%LQPLQXWHV
7RGHPRQVWUDWHKRZZHOO*OREXV2QOLQH·VSHUIRUPDQFHRSWLPL]DWLRQORJLFSHUIRUPVLQSUDFWLFDO
VLWXDWLRQVZHFRPSDUHG*OREXV2QOLQHSHUIRUPDQFHZLWKWKDWDFKLHYHGZKHQXVLQJVFSDQG
WKHJOREXVXUOFRS\*8&FOLHQW)LJXUHVKRZVWKHSHUIRUPDQFHRIYDULRXVWUDQVIHUPHFKDQLVPV
IRU WUDQVIHUV RYHU D KLJKVSHHG ZLGH DUHD QHWZRUN (61HW EHWZHHQ WZR KLJKSHUIRUPDQFH
SDUDOOHOVWRUDJHV\VWHPVDW$/&)DQG1(56&,WKDVWREHQRWHGWKDWWKHSHUIRUPDQFHRI*OREXV
2QOLQHLVEHWWHUWKDQRUVLPLODUWRWKHSHUIRUPDQFHRIWXQHG*8&WXQHGE\DGDWDPRYHPHQW
H[SHUW $FWXDOO\ WKH FRPSDULVRQ ZLWK WKH XQWXQHG *8& DQG 6&3 VKRZV WKH SHUIRUPDQFH
JDLQVWKDWPDQ\XVHUVFDQH[SHFWWRJDLQIURP*OREXV2QOLQH
)LJXUH'DWDWUDQVIHUSHUIRUPDQFHEHWZHHQ$/&)DQG1(56&IRUYDULRXVÀOHVL]HVDQGWUDQVIHUWHFKQRORJLHV








(

(


(

(
(

(

(
Rate (Mbits/sec)

File Size (bytes)
JR
JX
c
sc
p
WXQHGJX
c
A
RCHITECTURE
Globus Online comprises:


$VHWRIXVHUJDWHZD\VZKLFKH[LVWWRSURYLGHLQWHUDFWLRQEHWZHHQXVHUVDQGWKHV\VWHP
YLDWKH:HE&/,DQG5(67LQWHUIDFHV


$VHWRIRQHRUPRUHZRUNHUVZKLFKH[LVWWRRUFKHVWUDWHGDWDWUDQVIHUVDQGSHUIRUP
RWKHUWDVNVVXFKDVQRWLI\XVHUVRIFKDQJHVLQVWDWH


$SURÀOHVDQGVWDWHGDWDEDVHXVHGWRPDLQWDLQXVHUSURÀOHVDQGÀOHWUDQVIHU

VWDWHLQIRUPDWLRQ
3HUVRQDOORFDOPDFKLQHVDUHFRQQHFWHGWR*OREXV2QOLQHXVLQJ*OREXV&RQQHFWDRQHFOLFN
GRZQORDGDQGLQVWDOODSSOLFDWLRQIRU0DF26/LQX[DQG:LQGRZV*OREXV&RQQHFWFRPSULVHV
D*ULG)73VHUYHUWKDWUXQVDVDQRUPDOXVHUUDWKHUWKDQDVURRWIURPLQHWGOLNHDW\SLFDO*ULG)73
VHUYHU DQG D JVLRSHQVVK FOLHQW FRQÀJXUHG WR HVWDEOLVK DQ DXWKHQWLFDWHG FRQQHFWLRQ WR D
*OREXV2QOLQHUHOD\VHUYHULQRUGHUWRWXQQHOWKH*ULG)73FRQWUROFKDQQHOUHTXHVWVIURP*OREXV
2QOLQH*OREXV&RQQHFWRQO\HVWDEOLVKHVRXWERXQGFRQQHFWLRQVDQGWKXVFDQZRUNEHKLQGD
ÀUHZDOORURWKHUQHWZRUNLQWHUIDFHGHYLFHWKDWGRHVQRWDOORZIRULQERXQGFRQQHFWLRQV



F
ront-end architecture and performance

:HEUHQGHULQJHQJLQHDQGVXSSRUWLQJFRUHVHUYLFHVDUHGHYHORSHGLQ3\WKRQXVLQJ
WKH3\UDPLGZHEVHUYLFHVWDFN

*OREXV2QOLQHVHUYLFHVDUHOLJKWZHLJKWKLJKO\VFDODEOHVHUYLFHVWKDWSUHVHQW5(67IXO
LQWHUIDFHVOHYHUDJLQJ+773VHPDQWLFVIRUHDVHRIXVHSHUIRUPDQFHDQGVFDODELOLW\


8VHUGDWDLVVWRUHGLQ&DVVDQGUDDKLJKO\VFDODEOHVHFRQGJHQHUDWLRQGLVWULEXWHG
GDWDEDVHRULJLQDOO\GHYHORSHGE\)DFHERRNDQGQRZVXSSRUWHGDV
DQ$SDFKHSURMHFW

85/UXOHVDQGSUR[\ORJLFDUHKDQGOHGE\1*,1;DKLJKSHUIRUPDQFH+773VHUYHU
DQGUHYHUVHSUR[\

:HESDJHVDUHGHÀQHGZLWKDVLPSOH\HWSRZHUIXOGHFODUDWLYHPRGHOWKDW

HQDEOHVDVVHPEO\WREHSHUIRUPHGFOLHQWVLGHRUVHUYHUVLGH

:HEZLGJHWVDUHGHYHORSHGLQSXUH'+70/1RSUHVHQWDWLRQORJLFLVSHUIRUPHG
VHUYHUVLGH


Security and robustness

$OOZHEVLWHWUDIÀFLVKDQGOHGRYHU+7736WRHQVXUHVHFXULW\RIVHQVLWLYHGDWDDQGFRRNLHV

6XSSRUWIRUÀQHJUDLQHGDFFHVVFRQWUROWRXVHULQIRUPDWLRQ

$OOXVHUSURÀOHLQIRUPDWLRQLQFOXGLQJSHUVRQDOGDWDDQGVHFXULW\FUHGHQWLDOVLVVWURQJO\
HQFU\SWHGLQVWRUDJHWRIXUWKHUSURWHFWDJDLQVWXQDXWKRUL]HGDFFHVV



R
eliability and scalability


*OREXV2QOLQHLVKRVWHGRQ$PD]RQ·V(ODVWLF&RPSXWH&ORXG(&IRURSWLPXP
SHUIRUPDQFHDQGVFDODELOLW\


&RUHVHUYLFHVDUHKRVWHGRQWKUHHVHSDUDWH$PD]RQGDWDFHQWHUVIRUUHGXQGDQF\


*OREXV2QOLQHHDVLO\VFDOHVZLWKWKHDGGLWLRQRIQHZ(&LQVWDQFHV


7UDIÀFWR:HEVLWHLVORDGEDODQFHGE\$PD]RQ(ODVWLF/RDG%DODQFHU(/%IRU

VFDODEOHUHOLDELOLW\
SY
S
TEM
O
VERVIEW
*OREXV2QOLQHLVDKRVWHGÀOHWUDQVIHUV\VWHPWKDWSURYLGHVVLPSOHUHOLDEOHGDWDPRYHPHQWIRU
YLUWXDOO\DQ\UHVHDUFKHUDQGIDFLOLW\
*OREXV2QOLQHPDNHVKLJKSHUIRUPDQFHÀOHWUDQVIHUFDSDELOLWLHV²WUDGLWLRQDOO\DYDLODEOHRQO\
RQH[SHQVLYHVSHFLDOSXUSRVHVRIWZDUHV\VWHPV²DFFHVVLEOHWRDQ\UHVHDUFKHUZLWKDQ,QWHUQHW
FRQQHFWLRQDQGDODSWRS8VHUVVLPSO\VLJQXSIRUWKHVHUYLFHORJLQDFFHVVWKHLUGDWDXVLQJ
H[LVWLQJFUHGHQWLDOVDQGWKHQFOLFNRULVVXHDVLPSOHFRPPDQGWRWUDQVIHUÀOHV
Key features:
I
NTERFACE
S
*OREXV2QOLQHIHDWXUHVDZHELQWHUIDFHDFRPPDQGOLQHLQWHUIDFHDQGD5(67$3,IRU

PRYLQJÀOHVVHFXUHO\DQGUHOLDEO\
W
eb
UI
:
8VLQJMXVWDZHEEURZVHUÀOHVRUHQWLUHGLUHFWRULHV
IURPRQHHQGSRLQWFDQEHVHOHFWHGDQGWUDQVIHUUHGWR
DQRWKHUHQGSRLQW
C
L
I

C
ommands:
 )RU WKRVH PRUH FRPIRUWDEOH DW D
FRPPDQGSURPSWWKH¶VFS·DQG¶WUDQVIHU·FRPPDQGV
SURYLGHDGGLWLRQDORSWLRQVIRUFUHDWLQJDQGFRQWUROOLQJ
ÀOH WUDQVIHU UHTXHVWV <RX PD\ XVH D IDPLOLDU VFSOLNH
LQWHUIDFH IRU PRYLQJ ÀOHV EHWZHHQ WZR HQGSRLQWV RU
WDNHDGYDQWDJHRIWKH*OREXV2QOLQHWUDQVIHUFRPPDQG
IRULQFUHDVHGÁH[LELOLW\LQVSHFLI\LQJDOLVWRILQGLYLGXDO
ÀOHVRUGLUHFWRULHVIRUWUDQVIHUEHWZHHQHQGSRLQWV
T
ransfer
RE
S
T

A
P
I
:
7KLV´5(67IXOµ*OREXV2QOLQHLQWHUIDFHLVGHVLJQHGIRUGHYHORSHUVZKR
QHHGWRGHOLYHUVROXWLRQVWRDFRPPXQLW\RIXVHUVRULQWHJUDWHWRDQH[WHUQDOZHEVLWHRU
DSSOLFDWLRQ7KH7UDQVIHU5(67$3,PDNHVLWSRVVLEOHWRLQWHJUDWHUHOLDEOHGDWDPRYHPHQW
LQWR XVHUV· +3& ZRUNÁRZV VXFK DV LQWHJUDWLQJ ZLWK -DYD FOLHQWV RU:HE²EDVHG SRUWDOV
ZLWKQRUHTXLUHPHQWIRUVSHFLDOL]HGVRIWZDUH
Amazon Elastic Cloud Computing (EC2)
Data is securely archived on S
3
Amazon Elastic Lead Balancer (ELB
)
Traffic load balanced with EC2
Amazon Simple Storage Service (S3)
Instances distributed across multiple data center
s
Globus Onlin
e
EC2 Instanc
e
Globus Onlin
e
EC2 Instanc
e
Globus Onlin
e
EC2 Instanc
e
Ubuntu Linux Server
NGIN
X
Pyramid
Cassandra
DHMTL
(DFK&DVVDQGUDLQVWDQFH
FRQILJXUHGWRUHSOLFDWHGDWDRQ
RUPRUHLQVWDQFHVIR
r
LQFUHDVHGGXUDELOLW
\
'DWDLVVWRUHGHQFU\SWHGWR

SUHYHQWXQDXWKRUL]HGDFFHV
s
US
ER SCENARIO:
LATTICE Q
C
D DATA
M
OVEMENT
6FLHQWLVWVDWWKH0,0'/DWWLFH&RPSXWDWLRQ0,/&5HVHDUFK
&ROODERUDWLRQDUHXVLQJ*OREXV2QOLQHWRVLPSOLI\WKLVSURFHVV
RI PRYLQJ ODUJH ÀOHV DURXQG WR GLIIHUHQW VXSHUFRPSXWLQJ
FHQWHUV LQFOXGLQJ 7HUD*ULG PDFKLQHV OLNH.UDNHQ $WKHQD
/LQFROQDQG/RQJKRUQRU'2(FRPSXWHUVOLNH)UDQNOLQ,QWUHSLG
DQG +RSSHU RQ D IUHTXHQW EDVLV $FFRUGLQJ WR 6WHYHQ
*RWWOLHE'LVWLQJXLVKHG 3URIHVVRU,QGLDQD 8QLYHUVLW\ DQG D
VHQLRUPHPEHURIWKH0,/&&ROODERUDWLRQWKHUHVXOWLQJVSHHGXSKDVEHHQ´YHU\LPSUHVVLYHµ
ZLWKRQRQHRFFDVLRQRQHKXQGUHGJLJDE\WHÀOHVWUDQVIHUUHGLQMXVWPLQXWHVDSURFHVV
WKDWZRXOGKDYHWDNHRYHUGD\VZLWKVFS7KHUHGXFWLRQLQGHOD\VDQGPDQXDOLQWHUYHQWLRQ
KDV´PDGHDELJGLIIHUHQFHµLQWKHFRQYHQLHQFHRIPRYLQJSURMHFWVEHWZHHQFHQWHUV
globus online
5HOLDEOH)LOH7UDQVIHU1R,75HTXLUHG
A
ccelerating Data
M
ovement
T
o Support Lattice Q
C
D
C
omputations
C
HA
LL
ENGE
)LOH WUDQVIHU LV ERWK D FULWLFDO DQG IUXVWUDWLQJ DVSHFW RI FRPSXWH
LQWHQVLYH UHVHDUFK VXFK DV/DWWLFH TXDQWXP FKURPRG\QDPLFV
)RUDUHODWLYHO\PXQGDQHWDVNPRYLQJWHUDE\WHVRIGDWDUHOLDEO\
DQG HIÀFLHQWO\ FDQ EH VXUSULVLQJO\ FRPSOLFDWHG RQH PXVW GHDO
ZLWK GHWHUPLQLQJ DYDLODEOH SURWRFROV QHJRWLDWLQJ ÀUHZDOOV DQG
DXWKHQWLFDWLRQ SUHVWDJLQJ ÀOHV IRU WUDQVIHU GHWHFWLQJ DQG
UHVSRQGLQJ WR IDLOXUHV GLDJQRVLQJ QHWZRUN PLVFRQÀJXUDWLRQV
FRQÀJXULQJ VRIWZDUH HWF 2IWHQ/DWWLFH VFLHQWLVWV XVH
VXSHUFRPSXWHUVZKHUHWLPHKDVEHHQDOORFDWHGVXFKDV7HUD*ULG
RU'2(PDFKLQHV²EXWXVHRIWKHVHIDFLOLWLHVLVFRVWO\DQGFDQ
EH WLPHFRQVXPLQJ VLQFH VFLHQWLVWV PXVW GHDO ZLWK PDQDJLQJ
ÀOH WUDQVIHUV WR DQG IURP WKHVH PDFKLQHV EDVHG RQ ZKHUHYHU
FRPSXWLQJWLPHKDSSHQVWREHDYDLODEOH
,IÀOHWUDQVIHUFRXOGEHDFFHOHUDWHGDQGVLPSOLÀHGVFLHQWLVWVZRXOGEHQHÀWQRWRQO\IURPWLPH
DQGFRVWVDYLQJVEXWDOVRIURPWKHHOLPLQDWLRQRIWHGLRXVUHSHWLWLYHWDVNVWKDWWDNHWKHLUDWWHQWLRQ
DZD\IURPWKHLUFRUHUHVHDUFK

Globus Online frees
up my time to do
more creative work
rather than typing
scp commands or
devising scripts
to initiate and
monitor progress.”
Profiles & Stat
e
Worker
Reques
t
collector
GridFT
P
server
GridFT
P
server
Notificatio
n
target
User
User
User
User
)LJXUH+RVWLQJRQ$PD]RQ:HE6HUYLFHVSURYLGHVVFDODELOLW\IRUFRUHVHUYLFHV
)LJXUH6FKHPDWLFRIWKH*OREXV2QOLQHDUFKLWHFWXUH
“I moved 100 7.3 GB
ÀOHVLQDERXW
hours. The same

transfer would have
taken over 3 days
with scp.”



User
(2) User makes request
to Globus Online: e.g.,”Transfer
data from MyDesktop to SiteA”
Globus
Onlin
e
(3) Globus Connect
forwards request
s
to Globus Connect
(4) Globus Connect establishes data channel
connection to SiteA and transfers dat
a
GridFT
P
server
“SiteA”
“MyDesktop”
G
lobus
C
onnect
)LJXUH*OREXV&RQQHFW+LJKOHYHOUHSUHVHQWDWLRQRIWUDQVIHUÁRZ

5HOLDELOLW\WKURXJKDXWRPDWLFIDXOW
UHFRYHU\DQGLQWHJULW\FKHFNLQJ


,QWHJUDWHGPDQDJHPHQWRIWUDQVIHUV
DFURVVPXOWLSOHVHFXULW\GRPDLQVZLWK
PXOWLSOHXVHULGHQWLWLHV

2SWLPL]HGSHUIRUPDQFHE\DXWRWXQLQJ
WUDQVIHUVEDVHGRQQXPEHURIÀOHVDQG
ÀOHVL]HV

3HUIRUPDQFHPRQLWRULQJÀQHJUDLQHG
DFWLYLW\ORJJLQJDQGVWDWXVUHSRUWLQJ

&RQGLWLRQDOÀOHV\QFKURQL]DWLRQEDVHG
RQDWWULEXWHVVXFKDVVL]HWLPHVWDPSRU

FKHFNVXPPLVPDWFK

6LPSOLÀHGWUDQVIHUVIURPPDVVVWRUDJH
XVLQJSUHVWDJLQJRIGDWDDQG
automatic retries

7UDQVIHUIURPPDFKLQHVEHKLQGÀUHZDOOV
DQG1$7VXVLQJDOLJKWZHLJKWFOLHQW

6LPSOHFUHDWLRQDQGPDLQWHQDQFH
RISHUVRQDOHQGSRLQWVXVLQJ*OREXV
&RQQHFW

1HZIHDWXUHVLPPHGLDWHO\DYDLODEOHWR
scientists as a result of Software from

Service approach
(1)
R
egister
G
lobus
C
onnect
instance with
G
lobus
O
nline
“Globus Online is
WKHPRVWEHQHÀFLDO
grid technology I
have even seen.”