18 Month Program Report - Data to Insight Center - Indiana University

motherlamentationInternet και Εφαρμογές Web

7 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

281 εμφανίσεις

 
 
 
 
Report  to  the  Lilly  Endowment,  Inc.
 
Grant  Number  2008  1639
-­‐
000
 
 
 
 
18
 
Month  
Program  Report
 
January  1
-­‐
 
May
 
31
,  2010
 
 
 
 
 
 
 
 
Submitted  by:
 
 
Michael  A.  McRobbie
 
Indiana  University  President
 
 
Bradley  C.  Wheeler
 
Vice  President  for  Information  Technology,  
CIO,  and  Dean  of  Information  Technology
 
 
Craig  A.  Stewart
 
Executive  Director,  Pervasive  Technology  Institute,  Associate  Dean  of  Research  Technologies
 
 
 
2
 
Table  of
 
Contents
 
 
I.
 
Introduction  and  Executive  
Summary
 
................................
................................
................................
.
 
3
 
II.
 
Digital  Science  Center
 
................................
................................
................................
..........................
 
7
 
III.
 
Data  to  Insight  Center
 
................................
................................
................................
....................
 
28
 
IV.
 
Center  for  Applied  Cybersecurity  Research
 
................................
................................
...................
 
40
 
V.
 
Research  Technologies
 
................................
................................
................................
......................
 
46
 
VI.
 
Bringing  Distinction  to  the  State  of  Indiana
 
................................
................................
...................
 
55
 
VII.
 
Institute  Coordination  and  Support
 
................................
................................
...............................
 
57
 
VIII.
 
Management  and  Operations
 
................................
................................
................................
........
 
58
 
IX.
 
Economic  Development
 
................................
................................
................................
.................
 
59
 
X.
 
E
xternal  Relations  and  Strategic  Initiatives
 
................................
................................
.......................
 
63
 
XI.
 
Educating  the  Residents  of  Indiana  and  Beyond
 
................................
................................
...........
 
65
 
Appendix  1:  Technology  Disclosures  during  the  Reporting  Period
 
................................
............................
 
67
 
Appendix  2:  Open  Source  Software
 
................................
................................
................................
...........
 
68
 
Appendix  3:  Online  Services
 
................................
................................
................................
......................
 
70
 
Appendix  4:  Publications  (January  1
-­‐
May  31,  2010)
 
................................
................................
..................
 
73
 
Appendix  5:  Presentations  (January  1  

 
May  31,  2010)
 
................................
................................
............
 
80
 
Appendix  6:  Active  and  Pending  Grants
 
................................
................................
................................
....
 
87
 
Appendix  7:  Interim  Financial  Report
 
................................
................................
................................
........
 
95
 
Appendix  8:  Education,  Outreach  and  Training  Events
 
................................
................................
.............
 
97
 
Appendix  9:  Public  and  Governmental  Service  Activities
 
................................
................................
........
 
105
 
Appendix  10:  News  and  Media  Placements
 
................................
................................
............................
 
106
 
Appendix  11:  Glossary  of  Technical  Terms  Used  in  this  Report
 
................................
..............................
 
109
 
 
 
 
 
 
3
 
I.

Introduction  and  Executive  Summary
 
 
Pervasive  Technology  Institute  has  enjoyed  another  successful  period,  both  in  the  receipt  of  
more  external  grants  and  also  in  the  participation  of  
several  projects  of  national  and  
international  significance.  PTI  has  gained  a  solid  national  reputation  for  Indiana  University  and  
the  state  of  Indiana  in  the  areas  of  high  performance  computing,  data  management  and  
preservation,  computational  support  of  s
cientific  research,  and  security  and  privacy  
policymaking.
 
 
Although  the  announcement  came  shortly  after  the  close  of  the  current  period,  it  is  notable  
that  Indiana  University  was  recently  named  by  Computerworld  magazine  as  one  of  the  100  best  
places  to  wo
rk  in  IT  (
http://www.computerworld.com/spring/bp/detail/767
).  Although  this  is  
an  honor  that  is  shared  by  all  IT  organizations  within  the  IU  system,  the  recognition  specifically  
pointed  to  factors  such  as  the  opportunity  for  employees  to  participate  in  lea
ding  research,  be  
published  in  scholarly  journals  and  present  at  international  conferences  as  contributing  to  IU  
receiving  this  distinction.  In  these  ways,  PTI  has  contributed  significantly  to  the  award  criteria  
and  is  an  important  component  in  IU’s  succes
s  in  attracting  and  retaining  top  intellectual  talent  
to  the  state  of  Indiana
.
 
 
Other  m
ajor  h
ighlights  for  the  
reporting  period  are  summarized
 
below  in  non
-­‐
technical  
language.  More  detailed  and  technical  descriptions  can  be  found  in  the  body  of  the  report.
 
This  
repo
rt  has  been  structured  to  provide  nontechnical
 
bullet  lists  in  this  section  and  at  the  start  of  
each  of  our  other  especially  technical  sections.  Another  important  change  the  standard  format  
of  this  report  is  the  inclusion  of  IU’s  Res
e
arch  Technol
ogies  division  as  a  separate  entity  within  
the  PTI  report  rather  than  including  contributing  information  within  each  center.  RT  contributes  
so  substantially  to  the  success  of  PTI  and  its  activities  are  so  cross
-­‐
cutting  that  it  felt  important  
to  give  RT  its
 
own  section  in  order  to  most  accurately  and  completely  
reflect  its  contribution  to  
the  success  of  PTI.
 
 
PTI  highlights  for  the  reporting  period:
 
 


In  a  time  of  unprecedented  national  financial  hardship  
PTI  has  enjoyed  continued  
gr
ant  success  during  the  per
iod.  PTI  received
 
additional  grant  aw
ards  tota
ling  nearly  
$4  million
 
(bringing  its  external  funding  total  to  more  than  $22  million)
 
and  
submitted  a
 
remarkable  54  
currently  
pending  grants  totaling  more  than  $82.5  
million.
 
According  to  a  recent  report  by  the
 
US  
Science  Coalition,  “
When  public  
money  is  invested  in  university
-­‐
based  basic  research  there  is  tremendous  return  on  
investment.  Research  creates  jobs  directly  for  those  involved  and  indirectly  for  many  
others,  through  innovations  that  lead  to  new  techno
logies,  new  industries  and  new  
companies.”  (
http://www.sciencecoalition.org/successstories/index.cfm
)  
 


Fred  H.  Cate,  director  of  the  PTI  Center  for  Applied  Cybersecurity  Research  serv
ed  as  
a  policy  advisor  on  technology  privacy  and  security,  making  presentations  to  the  US  
Department  of  Commerce,  the  Committee  on  Judiciary  Subcommittee  on  Crime  and  
 
 
4
 
Drugs  in  the  US  Senate,  and  the  US  Federal  Trade  Commission.  Testimony  provided  
by  Cate  w
as  cited  in  numerous  national  media  outlets,  including  the  New  York  
Times.
 


During  the  reporting  period,  the  Digital  Science  Center  along  with  Research  
Technologies  Systems  and  Applications  groups  completed  the  initial  stage  of  
hardware  installation  and  tes
ting  for  the  FutureGrid  project.  This  is  a  complicated  
and  delicate  process  that  required  significant  technical  expertise  and  effort  to  
achieve.  The  FutureGrid  project  places  Indiana  and  Indiana  University  at  the  helm  of  
one  of  the  most  important  national  
efforts  related  to  the  future  of  technology  and  
scientific  research.  Supported  by  a  $10  million  grant  from  the  National  Science  
Foundation
 
and  led  by  PTI’s  own  Geoffrey  Fox
,  FutureGrid  provides  a  testbed
 
for  the  
most  significant  emerging  grid  and  cloud  technologies.
 
These  are  the  technologies  
expected  to  drive  global  business  and  scientific  research  in  the  coming  decades.
 
The  
project  will  be  used  to  define  the  future  of  US  national  computing  infrastructur
e  and  
contributes  significantly  to  US  competitiveness  in  the  sciences.
 


The  DSC  along  with  Research  Technologies  Systems  completed  installation  of  more  
than  21  teraflops  of  computing  power  to  support  the  Polar  Grid  project  during  the  
reporting  period.  The  P
olar  Grid  project,  which  is  funded  by  a  series  of  grants  from  
the  National  Science  Foundation,  has  been  enormously  successful  and  has  continued  
to  grow.  The  project  is  creating  a  computational  grid  in  the  polar  regions  to  support  
research  by  the  
NASA  and  t
he  Center  for  Remote  Sensing  of  Ice  Sheets  (CReSIS)  on  
the  earth’s  rapidly  melting  polar  ice  sheets.  Rising  sea  levels  created  by  melting  ice  
sheets  threaten  coastal  areas  with  flooding  and  endangers  wildlife.  The  
computational  power  provided  by  Polar  Grid
 
is  allowing  scientists  to  begin  
processing  data  while  still  in  the  field  and  has  greatly  increased  the  speed  at  which  
discoveries  can  be  made  in  this  critical  race  against  time.  During  the  period,  RT  
Systems  employees  traveled  to  Greenland  and  Antarctica  
and  Chile  to  install  and  
upgrade  Polar  Grid  systems.
 


The  Data  to  Insight  Center
 
(D2I)
 
organized  and  led  Indiana  University’s  participation  
in  a  $20M  proposal  to  the  National  Science  Foundation  
Sustainable  Digital  Data  
Preservation  and  Access  Network  Partne
rs  (DataNet)  program.  The  project  would  
help  to  develop  techniques  to  preserve  valuable  data  related  to  meteorological  
science.  Word  on  the  proposal,  which  received  a  successful  NSF  site  visit  in  February  
2010,  is  expected  summer  2010.    
 


D2I  made  significant  contributions  to  
the  Vortex2  project,  the  largest  national  effort  
to  date  to  understand  the  formation  and  behavior  of  tornados.  D2I’s  LEAD  II  
technology  provided  real
-­‐
time  forecasts  to  handheld  devices  used  by  storm  chasers  
in  the  fiel
d.
 
 
 
 
 
 
 
5
 
Scholarly  Ac
complishment
 
 
Group
 
Publications
 
Technical  
Presentations
 
Inventions  
Disclosed
 
Open  
Source  
Software  
Distributed
 
Online  
Services  
Provided
 
Public  and  
Governmental  
Service  Activities
 
Digital  Science  
Center
 
63
 
58
 
1
 
9
 
17
 
1
 
Data  to  
Insight    
Center
 
5
 
14
 
0
 
3
 
1
 
0
 
Center  for  
Appled  
Cybersecurity  
Research
 
26
 
18
 
0
 
0
 
0
 
0
 
UITS  Research  
Technologies
 
1
 
11
 
1
 
0
 
0
 
0
 
Pervasive  
Technology  
Institute  Total
 
95
 
101
 
1
 
12
 
18
 
1
 
 
Educating  the  21
st
 
Century  Workforce
 
 
Group
 
Undergraduate  
S
tudent  
E
mployees
/I
nterns
 
M.S.  
Students  
E
mployed
 
or  
S
upported
 
Ph.D.  
Students  
E
mployed  
or  
S
upported
 
Undergraduate  
Degrees  
A
warded
 
M.S.  
Degrees  
A
warded
 
P
h.D.  
Degrees  
A
warded
 
Education,  
Outreach  
and  
Training  
Events
 
Digital  
Science  
Center
 
0
 
3
 
18
 
0
 
4
 
3
 
11
 
Data  to  
Insight  Center
 
5
 
26
 
13
 
3
 
3
 
0
 
8
 
Center  for  
Applied  
CyberSecurity
 
0
 
0
 
0
 
0
 
0
 
0
 
13
 
Pervasive  
Technology  
Institute
 
Total
 
5
 
29
 
31
 
3
 
7
 
3
 
32
 
 
 
 
 
 
6
 
Grant
-­‐
related  Activity
 
Group
 
Number  of  Grant  
and  Contract  
Proposals  
Submitted
 
To
tal  Dollar  Amount  of  
Proposals  Submitted
 
Number  of  
Grant  
and  C
on
tract  
Proposals  
Awarded
 
Total  Dollar  Amount  of  
Proposals  Awarded
 
Digital  
Science  
Center
 
29
 
$15,127,601
 
5
 
$756,662
 
Data  to  
Insight    
Center
 
10
 
$27,074,510
 
3
 
$1,242,066
 
Center  for  
Appl
i
ed  
Cybersecurity  
Research
 
8
 
$14,597,012
 
3
 
$149,786
 
UITS  
Research  
Technologies  
(PTI  Related)
 
7
 
$25,738,276
 
1
 
$1,839,949
 
Pervasive  
Technology  
Institute  
Total
 
54
 
$82,537,399
 
12
 
$3,988,463
 
 
 
 
 
 
7
 
 
II.

Digital  Science  Center
 
                                         
Geoffrey  C.  Fox,  Director
 
II.1  
Digital  Sci
ence  Center  Mission  and  
Activity  Summary
 
 
The  Digital  Science  Center
 
(DSC)
 
focuses  on  creating  an  intuitively  usable  cyberinfrastructure  
with  tremendous  capabilities  for  supporting  collaboration  and  computation.  Easy
-­‐
to
-­‐
use,  
human
-­‐
centered  interfaces  to  cyberinfrastructure
 
created  by  the  Digital  Science  Center  will  
enable  the  many  thousands  of  researchers  in  the  public  and  private  sectors  to  use  the  
capabilities  of  cyberinfrastructure  and  accelerate  innovation  and  discovery.
 
 
The  DSC
 
includes  the  following  labs  and  support  
units:
 


Community  Grids  Lab  
-­‐
 
Geoffrey  Fox,  Director;  
Marlon  Pierce,  Gregor  Von  Laszewski  and  
Judy  Qiu,  Assistant  Directors
 


Complex  Networks  and  Systems  Group  
-­‐
 
Alex  Vespignani,  Director
 


Open  Systems  Lab  
-­‐
 
Andrew  Lumsdaine,  Director
 


University  Information  T
echnology  Services  (UITS)  Research  Technologies  (RT)  
Applications  Division  
-­‐
 
D.  Scott  McCaulay,  Director
 


UITS/RT  Systems  Division  

 
Matt  Link,  Director
 
 
Center  High
lights  January  1
-­‐
May  31,  2010
 
 
 
The  following  bullet  list  provides  a  non
-­‐
technical  overview  
of  accomplishments  for  the  period.  
More  detailed  and  technical  descriptions  appear  in  the  section  that  follows.
 
 


With  the  Research  Technologies  Systems  group,  the  Digital  Science  Center  made  
significant  progress  on  its  FutureGrid  Project.  FutureGrid  is  fun
ded  by  a  $10  million  
grant  from  the  National  Science  Foundation  and  puts  IU  in  a  leadership  role  of  one  of  
the  largest  and  most  important  research  efforts  in  U.S.  computational  science.  
FutureGrid  is  a  national  testbed  for  emerging  grid  and  cloud  computing
 
technologies  
that  hold  tremendous  potential  for  business  and  scientific  research.  FutureGrid  will  help  
to  define  way  the  NSF  provides  computing  power  to  scientists  in  the  coming  decades  
and  will  have  a  significant  impact  on  U.S.  competitiveness  in  scienti
fic  research.
 
During  
the  reporting  period,  the  main  infrastructure  for  FutureGrid  was  completed.  A  site  visit  
by  the  NSF  is  scheduled  in  July  to  approve  the  hardware  and  open  the  system  for  use  by  
scientists.
 


The  DSC  along  with  Research  Technologies  System
s  completed  installation  of  more  than  
21  teraflops  of  computing  power  to  support  the  Polar  Grid  project  during  the  reporting  
period.  The  Polar  Grid  project,  which  is  funded  by  a  series  of  grants  from  the  National  
Science  Foundation,  has  been  enormously  suc
cessful  and  has  continued  to  grow.  The  
project  is  creating  a  computational  grid  in  the  polar  regions  to  support  research  by  the  
 
 
8
 
NASA  and  the  Center  for  Remote  Sensing  of  Ice  Sheets  (CReSIS)  on  the  earth’s  rapidly  
melting  polar  ice  sheets.  Rising  sea  levels
 
created
 
by  melting  ice  sheets  threaten
 
coastal  
areas  with  flooding  and  endangers  wildlife.  The  computational  power  provided  by  Polar  
Grid  is  allowing  scientists  to  begin  processing  data  while  still  in  the  field  and  has  greatly  
increased  the  speed  at  which
 
discoveries  can  be  made  in  this  critical  race  against  time.  
During  the  period,  RT  Systems  employees  traveled  to  Greenland  and  Antarctica  and  
Chile  to  install  and  upgrade  Polar  Grid  systems.
 


The  Digital  Science  Center  continues  as  a  leader  in  the  developme
nt  of  portals  and  
gateways.  Portals  and  gateways  are  online  services  that  help  scientists  gain  easy  access  
to  the  supercomputers  they  need  to  perform  their  research.  Using  supercomputers  can  
be  a  significant  challenge  because  there  has  traditionally  been  a
 
steep  learning  curve.  
Portals  and  gateways  allow  scientists  to  more  easily  access  advanced
 
technology  
without  requiring  a
 
deep  understanding  of  how  that  technology  operates.  During  the  
reporting  period,  the  DSC’s  QuakeSim  earthquake  modeling  portal  played
 
a  crucial  role  
in  research  conducted  by  NASA  on  the  Baja  California  earthquake.
 


During  the  reporting  period,  the  Community  Grids  Lab  had  an  important  software  
release  of  its  “Twister”  program
,  a  powerful  tool  that  helps  scientists  find  meaning  in  
very  
large  data  sets
.  Twister  improves  upon  Google’s  popular  MapReduce  softw
are  tool,  
allowing  it  to  achieve  higher  performance,  perform  faster  data  transfers,  and  reduce  the  
time  it  takes  to  process  vast  sets  of  data  for  data  mining  and  machine  learning  
applic
ations.  Twister  has  a  great  deal  of  potential  to  increase  the  speed  of  scientific  
discovery,  especially  in  the  areas  of  biomedical  research.
 


The  Open  Systems  lab  had  an  active  period,  completing  one  major  project  and  reaching  
milestones  in  several  others.  
The  OSL  contributes  to  scientific  and  business  
competitiveness  in  Indiana  and  the  U.S.  by  providing  valuable  open  source  software  that  
optimizes  high  performance  computers  and  is  freely  available  to  the  scientific  research  
and  business  communities.  
 


The  
Co
mplex  Networks  and  Systems  (CNeTS)  group  had  another  successful  period.  A  
primary  research  focus  involves  the  modeling  of  the  spread  of  infectious  disease
,  
including  H1N1  and  HIV  in  parts  of  Africa,
 
in  order  to  help  health  officials  make  decisions  
about  ho
w  to  prevent  or  slow  the  spread  of  illness  in  populations.  During  the  period,  
CNeTS  director,  Alex  Vespignani  was  featured  in  the  prestigious  international  journals  
Science
 
and  
Nature
 
describing  his  work  on  modeling  the  H1N1  pandemic.  Vespignani’s  
predicti
ons  about  the  spread  of  H1N1  were  found  to  be  exceptionally  accurate,  bringing  
him  international  recognition  for  the  modeling  techniques  developed  in  his  lab.
 
 
 
 
 
 
9
 
Scholarly
 
Accomplishment
 
A  summary  of  the  scholarly
 
accomplishments  of  the  Digital  Science  Cen
ter  during  this  reporting  period  is  
provided  below
:
 
Group
 
Publications
 
Technical  
P
resentations
 
Inventions  
Disclosed
 
Open  Source  
Software  
D
istributed
 
Online  
Services  
Provided
 
Public  and  
Governmental  
Service  Activities
 
Community  
Grids  Lab
 
45
 
37
 
1
 
4
 
13
 
1
 
Open  
Systems  Lab
 
6
 
5
 
0
 
3
 
1
 
0
 
Complex  
Networks  
and  Systems  
Group
 
12
 
16
 
0
 
2
 
3
 
0
 
Digital  
Science  
Center  Total
 
63
 
58
 
1
 
9
 
17
 
1
 
 
 
Educational  Activities
 
The  following  table  provides  a  summary  of  the  educational  activi
ties  of  the  DSC
 
during  this  reporting  
period:
 
Group
 
Undergraduate  
S
tudent  
E
mployees
/I
nterns
 
M.S.  
Students  
E
mployed
 
or  
S
upported  
by  DSC
 
Ph.D.  
Students  
E
mployed  
or  
S
upported  
by  DSC
 
Undergraduate  
Degrees  
A
warded
 
M.S.  
Degrees  
A
warded
 
P
h.D.  
Degrees  
A
warded
 
Education,  
Outreach
 
and  
Training  
Events
 
Community  
Grids  Lab
 
0
 
3
 
0
 
0
 
3
 
0
 
6
 
Open  Systems  
Lab
 
0
 
0
 
10
 
0
 
0
 
1
 
3
 
Complex  
Networks  and  
Systems  Group
 
0
 
0
 
8
 
0
 
1
 
2
 
2
 
Digital  Science  
Center  Total
 
0
 
3
 
18
 
0
 
4
 
3
 
11
 
 
 
 
 
 
10
 
Funded  Research
 
The  following  table  provides  a  summary  of  grants  
submitted  and  grants  receiv
ed  by  the  DSC
 
during  the  current  reporting  period:
 
Group
 
Number  of  Grant  
and  Contract  
Proposals  
Submitted
 
To
tal  Dollar  Amount  of  
Proposals  Submitted
 
Number  of  Grant  
and  C
on
tract  
Proposals  
Awarded
 
Total  Dollar  Amount  of  
Proposals  
Awarded
 
Community  
Grids  Lab
 
10
 
$8,476,235
 
2
 
$332,792
 
Open  Systems  
Lab
 
6
 
$3,643,489
 
1
 
$50,321
 
Complex  
Networks  and  
Systems  Group
 
13
 
$3,007,877
 
2
 
$373,549
 
Digital  Science  
Center  Total
 
29
 
$15,127,601
 
5
 
$756,662
 
 
II.2  
Digital  Science  Center
 
Research
 
 
The  
following  section  i
ncludes  research  highlights  for  the  Digital  Science  Center  as  a  whole  and  
for
 
each  lab  and  group  w
ithin  the  DSC
.
 
 
II.2.1  Center
-­‐
wide  Research
 
 
Projects  Achieving  Major  Milestones
 
 
FutureGrid
 
(With  Research  Technologies  Systems  and  Applic
ations  Groups)
 
The  
FutureGrid
 
project,  which  was  announced
 
in  the  previous  report
 
and  started  in  fall  of  2009
,  
is  a  test  bed  for  emerging  technology  related  to  grid  and  cloud  computing.
 
The  project  places  
Indiana  University  at  the  helm  of  one  of  the  most  i
mportant,  leading  edge  projects  in  the  field  
of  computational  science  today.
 
It  is  a  national  collaboration  supported  by  a  $10  million  grant  
from  the  National  Science  Foundation  led  by  PTI’s  Geoffrey  Fox.  The  goal  of  FutureGrid
 
is  to  
allow  the  U.S.  science  and  business  communities
 
to
 
test  the  most  promising
 
new  
supercomputing  technologies  in  o
rder  to  
plan  the
 
next  generation  of  nationa
l  computational  
infrastructure  that  will  be  provided  by  the  National  Science  Foundation.  The  NS
F  currently  
provides  national  computational  resources  to  the  U.S.  scientific  comm
unity
 
through  
supercomputing  centers  and  networks  such  as  the  TeraGrid
.  FutureGrid  will  help  NSF  to  
establish  future  scientific  research  networks  in  order  to  preserve  U.S.  com
petitiveness  in  
science  and  business.
 
 
FutureGrid  focuses  on  cloud  technologies  as  the  emerging  computational  paradigm  in  the  
coming  decades.  Wikipedia  defines  cloud  computing  as  “Internet
-­‐
based  computing,  whereby  
shared  resources,  software,  and  informatio
n  are  provided  to  computers  and  other  devices  on  
demand,  like  the  electricity  grid.”  Cloud  computing  supports  research  and  business  by  providing  
a  single  access  point  to  numerous  computational  resources  that  lie  “in  the  cloud”  without  
 
 
11
 
requiring  that  the  us
er  know  or  understand  the  complex  technology  that  is  supporting  them.  
Businesses  such  as  Google  and  Amazon  are  already  heavily  relying  upon  cloud  computing  to  
support  their  business  and  are  proving  it  to  be  a  critical  emerging  technology.
 
 
The  current  reporting  period  has  been  highly  active  for  FutureGrid,  as  the  main  framework  of  
the  test
best  was  established  and  tested  
during  this  time
.
 
One  of  the  major  milestones  is  the  
acceptance
 
by  the  NSF
 
of  the  computing  hardware  that  makes  up  the  grid
 
testbed.  This  is  still  in  
progress  but  should  be  completed  by  end  of  June  2010.  The  hardware  that  will  become  
available  in  the  last  we
ek  of  June  is  listed  in  the  table  
below.  At  present,  the  FutureGrid  team  is  
working  on  a  software  architecture  that  allow
s  them  to  dynamically  provision  different  
software  stacks  onto  the  FutureGrid  hardware  initiated  by  the  users.  FutureGrid  uses  a  concept  
called  “raining”  that  supports  virtual  environments,  helping  to  minimize  overhead  and  
maximize  performance  for  the  rese
arch  scientists  (see  figure  below).
 
 
 
 
 
 

(Above)  High
-­‐
level  hardware  specifications  for  systems  to  be  included  as  part  of  the  FutureGrid.
 
The  testbed  will  help  to  
define  the  next  national  computing  grid  in  the  U.S.,  supporting  U.S.  competitiveness  in  science  and  industry.
 
 
Futur
e
Grid
 
“rains”  an  environment  on  the  testbed  suitable  for  the  
user  to  conduct  his  experiments.  It  reduces  overhead  and  
maximizes  performance  for  researchers.
 
 
The  coming  period  will  be  an  exciting  time  for  the  FutureGrid  project  as  the  hardware  officially  
become
s  available  for  use  by  the  community  and  large
-­‐
scale  testing  of  these  cutting  edge  cloud  
System type

# CPUs

# Cores

TFLOPS

Total RAM
(GB)

Secondary
storage (TB)

Site

IBM iDataPlex (sierra)

168

672

7

2688

72

SDSC

Cray XT5m (xray)

168

672

6

1344

335

IU








 
 
12
 
technologies  will  begin.
 
 
Portals  and  Gateways  (with  Research  Technologies  Applications  Group)
 
 
The  
Research  Technologies  Applications  (RT
-­‐
A)
 
group  at  Indiana  Universi
ty
 
continues  to  partner  
with  the  PTI  Digital  Science  Center  to  address  usability  issues  in  scientific  computing
 
through  
the  development  of  portals  and  gateways
.
 
Today’s  multicore  computers  offer  unparalleled  
computing  power  for  scientific  research,  but  the
 
barriers  to  entry  can  be  quite  high.  Portals  and  
gateways  provide  easy  to  use  single  entry  points  for  scientists  to  access  high  performance  
computers  and  other  advanced  technology  essential  to  their  research  without  requiring  that  
they  have  an  in
-­‐
depth  un
derstanding  of  the  computers  themselves.  During  the  reporting  
period,  DSC  and  RTA  staff  have  worked  together  to  release  tools
 
and  provide  support
 
for  
science  gateways  as  part  of  the  Open  Science  Grid  and  Linked  Environments  for  atmospheric  
discovery.    
Port
als  and  gateways  developed  by  PTI  are  helping  U.S.  scientists  be  more  
competitive  by  helping  them  gain  easy  access  to  some  of  the  most  powerful  scientific  
computing  available.
 
 
PolarGrid  (With  Research  Technologies  Systems  Group)
 
The  IU
-­‐
led  Polar  Grid  
project  is  creating  a  high  performance  computational  grid  in  the  northern  
and  southern  global  arctic  regions  in  order  to  process  data  collected  about  the  rapidly  melting  
ice  sheets.  Funded  by  a  grant  from  the  National  Science  Foundation,  Polar  Grid  allows  
scientists  
to  process  ice  sheet  data  while  still  in  the  field,  speeding  the  time  between  data  collection  and  
discovery.  Because  the  melting  sea  ice  has  potentially  serious  environmental  consequences  for  
low
-­‐
lying  and  coastal  areas,  this  is  a  problem  that  m
ust  be  understood  and  mitigated  as  quickly  
as  possible.  Polar  Grid  is  significantly  improving  the  speed  at  which  discoveries  about  polar  ice  
can  be  made.
 
During  the  reporting  period,  the
 
Research  Technologies  Systems
 
group
 
installed  21.9  TFLOPS  of  
HPC  syst
ems  for
 
the  
Polar  Grid  project
.    A  portion  of  that  system,  64  nodes,  has  been  installed  
separately  and  will  be  relocated  to
 
Elizabeth  City  Sta
te  University  
(ECSU)  later  in  2010.    ECSU  is  
a  partner  on  the  PolarGrid  award  and  is  a  minority  serving  institutio
n
 
in  North  Carolina
.    In  
addition  to  installing  this  new  system  at  IU  in  Bloomington  two  RT
-­‐
S  staff  members  traveled  to  
Thule  in  Greenland  for  fieldwork  related  to  the  PolarGrid  project.    The  expedition  was  lead  as  
part  of  NASA’s  IceBridge  mission,  the  lar
gest  airborne  survey  ever  flown  of  Earth's  polar  ice.    RT
-­‐
S  also  worked  closely  with  the  Center  for  Remote  Sensing  of  Ice  Sheets  (CReSIS)  that  is  
headquartered  at  the  University  of  Kansas  for  the  IceBridge  mission.    RT
-­‐
S  is  working  closely  
with  CReSIS  and  
RT
-­‐
A  to  port  the  current  analysis  code  from  using  runtime  Matlab  libraries  to  a  
compiled  environment.
 
 
 
 
 
 
13
 
II.2.2
 
Community  Grids  Lab
 
 
The  mission  of  the  Community  Grids  Lab  (CGL)  is  to  create  the  technology  that  will  enable  grid  
computing  to  help  solve  impo
rtant  scientific  problems.  In  creating  new  global  communities,  grid  
computing  will  open  the  way  to  new  possibilities  for  e
-­‐
Business  and  e
-­‐
Science.  The  Community  
Grids  Lab  (CGL)  focuses  on  creating  new  technology  infrastructure  and  applications  that  will  
en
able  distributed  business  enterprises  and  cyberinfrastructure  for  distributed  science  and  
engineering.  Computers  and  networks  are  getting  faster;  the  distinction  between  computers  
and  the  network  is  blurring.  This  points  to  a  future  where  individuals  and  c
orporations  interact  
with  grid
-­‐
based  applications  without  needing  to  explicitly  manage  the  underlying  technology  
details.  CGL's  focus  on  applications  has  spawned  much  cross
-­‐
disciplinary  collaboration  in  
research  and  development  of  scientific  and  business  a
pplications.  A  current  major  emphasis  is  
in  earth  science  and  particle  physics,  with  other  projects  in  education,  biocomplexity,  
chemistry,  apparel  design,  digital  film  production,  and  sports  informatics.
 
 
Projects  A
chieving  Major  Milestones  
 
 
Open  Grid  
Computing  Environments
 
Funding  Agency:  National  Science  Foundation
 
 
The  Open  Grid  Computing  Environments  (OGCE)  project  creates  open  source  software  for  
building  science  gateways  and  consults  with  many  major  participants  in  the  TeraGrid  Science  
Gateway  pro
gram.  Science  gateways  are  web
-­‐
based  access  points  and  tools  that  make  it  easier  
for  scientists  to  use  advanced  computing  technology  by  greatly  reducing  the  amount  of  
computational  expertise  required  to  run  experiments  using  supercomputers  and  other  
advanc
ed  technology.  During  this  period  the  OGCE  completed  its  preliminary  integration  of  
several  major  components  into  a  single  build  environment:  the  OGCE  Google
-­‐
compatible  
Gadget  container,  the  XRegistry  service  registry,  the  GFAC  application  factory  tool,  th
e  XBaya  
workflow  composer,  the  Registry  gadget,  and  the  Experiment  Builder  gadget.    All  of  these  tools  
can  now  be  compiled  and  deployed  together  or  on  separate  servers  using  a  single  build  
command.  The  OGCE  also  provided  integration  and  consulting  support  
for  the  following  science  
gateways:  GridChem  (NCSA),  a  computational  chemistry  gateway,    is  now  using  the  XBaya  
workflow  composer;  UltraScan  (UTHSCSA),  a  biophysics  gateway,  is  using  OGCE's  GFAC  and  
supporting  tools  to  prototype  its  new  job  submission  infr
astructure;  the  Expressed  Sequence  
Tag  (EST)  Pipeline  Portal  is  using  the  OGCE's  advanced  job  submission  tools  to  run  10,000's  of  
jobs  on  both  local  Indiana  University  and  TeraGrid  resources.  The  completion  of  this  milestone  
should  greatly  improve  the  abil
ity  of  scientists  to  use  TeraGrid  computing  resources.
 
 
 
14
 
 
 
QuakeSim  
 
Funding  Agency
:  NASA
 
 
QuakeSim  is  a  NASA  funded  project  to  build  a  science  gateway  and  supporting  Web  service  for  
the  earthquake  science  community.  QuakeSim  includes  both  earthquake  fault  spatial  
deformation  and  GPS  time  series  analysis  tools.    During  the  current  reporting  peri
od,  we  made  
several  major  upgrades  to  the  deployed  infrastructure,  including  the  ability  to  create  synthetic  
InSAR  fringe  diagrams  that  can  be  compared  to  direct  observation.  These  tools  were  used  
prominently  by  PI  Andrea  Donnellan  in  studies  of  the  afterm
ath  of  the  Baja  California,  Mexico  
earthquake  (see  figure).  
 
 
 
OGCE's  workflow  composer  tool  integrated  with  the  GridChem  science  gateway's  middleware.    The  figure  shows  a  
computational  chemistry  workflow  chain  of  services  (represented  as  boxes)  that  comb
ine  the  CHARMM  and  Gaussian  
applications  to  calculate  molecular  structures.  
 
 
 
15
 
 
 
Screen  shot  of  displacement  vectors  (arrows)  and  InSAR  deformation  plot  from  a  simulation  of  the  April  2010  Baja  
earthquake,  produced  using  the  QuakeSim  portal's  online  
services.
 
OREChem  
 
Funding  Agency:  Microsoft  Research
 
OREChem  is  a  collaboration  between  crystallographers,  digital  librarians,  and  
cyberinfrastructure  researchers  to  extend  the  Object  Reuse  and  Exchange  (ORE)  specification  to  
crystallography  and  more  gene
rally  to  the  problem  of  integrated  scientific  information  
management.    The  Community  Grids  Lab's  role  is  to  provide  expertise  in  service
-­‐
oriented  
computing  and  Grid  computing.    During  the  current  reporting  period,  we  developed  a  collection  
of  REST  services
 
for  processing  OREChem  Atom/XML  feeds,  converting  them  into  RDF  triples  
and  storing  in  our  RDF  triple  store  for  later  search  and  retrieval.    We  also  implemented  services  
for  constructing  and  executing  computational  chemistry  jobs  on  the  TeraGrid  using  ORE
Chem  
feed  information.    We  used  OGCE  tools  for  this  service  composition  and  execution.
 
 
 
16
 
 
A  subset  of  IU's  OREChem  services  are  shown  in  the  OGCE's  XBaya  workflow  composer  tool.    These  online  services  (boxes  in  
the  main  canvas)  are  integrated  to  extract  cry
stallographic  information  (molecular  structures),  create  Gaussian  
computational  chemistry  input  files  from  them,  and  run  Gaussian  jobs  on  the  TeraGrid.
 
 
Multicore  Project
 
Funding  Agenc
y:  Microsoft,  Inc.
 
 
 
 
This  project  is  focused  on  programming  models  and  runtime  that  will  be  used  on  systems  of  
multicore  computers.
 
These  programming  models  are  useful  for  scientific  research  in  a  variety  
of  biomedical  areas  including  genetic  and  drug  research  as  well  as  other
 
data  intensive  
research.
 
Initial  work  focused  on  per
formance  of  threading  versus  Message  Passing  Interface  
(MPI)
 
in  both  kernels  and  datamining.  Current  major  areas  are  biomedical  applications  and  data  
intensive  technologies  using  Hadoop  and  Dryad.
 
MapRed
uce  and  its  generali
zations  offer
 
an  attractive  programming  model  for  data  intensive  
computing.  In  particular,  our  research  is  using,  extending  and  evaluating  Iterative  MapReduce  
 
 
17
 
which  adds  support  of  iterative  problems  to  the  core  MapReduce  capabilities  o
f  “map”  
followed  by  “reduce”.  
 
During  the  period,  CGL  had  
a  major  open  source  software  release  of  “Twister”  
(
http://www.iterativemapreduce.org/
),  developed  as  a  novel  prototype  of  i
-­‐
MapReduce.  We  
will  cont
inue  to  look  at  community  and  commercial  MapReduce  systems  Hadoop  and  Dryad  
and  feed  back  our  lessons  to  their  developers  directly  and  through  our  papers.  We  have  
identified  support  of  inhomogeneous  problems  (where  currently  dynamic  scheduling  in  Hadoop  
so
metimes  outperforms  the  static  task  definition  in  Dryad)  as  one  key  issue.  A  challenge  for  
Iterative  MapReduce  is  maintaining  the  dynamic  fault  tolerance  of  current  systems  while  
extending  support  to  iterative  problems  with  tighter  synchronization  constrai
nts.
 
II.2
.3
 
Open  Systems  Lab  (Andrew  Lumsdaine,  Director)
 
 
The  OSL  mission  is  to  develop  science  and  technology  for  computing  with  large
-­‐
scale  and  
pervasive  hardware  and  software  systems,  to  enable  more  productive  computing  and  software  
development,  and  to  foster  economic  development  in  the  State  of  Indiana.  Work  in  the  Open  
Systems  Laboratory  (OSL)  is  motivated  by  the  changing  nature  of  modern  information  
technology  systems.  
 
 
Projects  Completed  During  Current  Reporting  Period
 
 
ST
-­‐
CRTS:  
Collaborative  Research:  Lifting  Compiler  Optimizations  via  Generic  Programming  
 
Principal  Investigator,  Co
-­‐
PIs:  Andrew  Lumsdaine  (IU),  Jaakko  Jarvi  (Texas  A&M)
 
Funding  Agency:  National  Science  Foundation
 
Award  Number:  CCF
-­‐
0541335
 
Award  Amount:  $279,233
 
Eff
ective  dates:  2/15/2006  

 
1/31/2010
 
 
Project  Summary:  
 
The  N
SF
-­‐
funded  research  team,  which  includes  
Andrew  Lumsdaine  at  Indiana  University  and  
Jaakko
 
Järvi  at  Texas  A&M  University  
along  with  their  collaborators
,
 
have  applied  the  principles  
of  generic  programming  to  improve  optimization  of  computer  software  by  compilers.
 
Generic  
programming  is  a  type  of  computer  programming  that  uses  non
-­‐
specific  basic  instructions  that  
can  be  tailored  later  to  specific  projects,  
saving  time  and  reducing  redundancy  when  writing  
code.  Compilers  are  sets  of  code  that  convert  source  code  written  in  one  programming  
language  into  another  programming  language  in  order  to  improve  software  performance.  
The  
optimizer  in  a  compiler  attempts  
to  transform  a  given  program  to  one  that  performs  faster  than  
the  original  program,  but  that  still  produces  equivalent  results.  
The  goal  of  this  project  was  to  
develop
 
new  programming
 
techniques  that  improve  
software  generally  

 
for  the  benefit  of  
science,
 
business,  education  and  society.
 
Potential  transformations  often  arise  from  general  (algebraic)  rules.  For  example,  an  
elementary
-­‐
school  student  learns  that  adding  zero  to  any  number  x  is  an  unnecessary  
 
 
18
 
computation
-­‐-­‐-­‐
optimizers  today  routinely  utilize  suc
h  a  rule  to  eliminate  unnecessary  
computations.  A  high
-­‐
school,  or  maybe  a  college  student,  learns  that  the  same  rule  applies  to  
adding  the  zero  matrix  to  any  (compatible)  matrix,  and  indeed  to  the  binary  operation  and  the  
identity  element  of  any  monoid  (a  
mathematical  structure  having  such  a  law).    Optimizers  today  
are  very  unlikely  to  take  advantage  of  these  general  rules,  however.    Compilers'  optimizers'  
view  of  programs  is  very  ``low
-­‐
level,''  and,  as  a  result,  many  optimization  opportunities  remain  
unrea
lized.    To  achieve  the  best  performance,  programmers  must  adapt  how  they  write  their  
programs,  and  often  complicate  them.
 
The  research  team  has
 
demonstrated  how  ``high
-­‐
level''  general  rules,  algebraic  laws,  about  
operators  and  functions  can  be  represented  
and  organized,  and  used  by  compilers'  optimizers.  
As  a  result,  programmers  can  use  the  most  suitable  abstractions  that  help  them  in  effectively  
producing  correct  programs,  and  yet  obtain  efficient  programs.    The  team's  approach  is  applied  
to  C++,  a  commonl
y
-­‐
used  programming  language,  and  targets  generic  simplification  rules,  
removal  of  redundant  computations  and  data  transfers,  and  similar  optimizations.
 
Intellectual  Merit:  
 
Compiler  research  has  almost  exhausted  the  optimization  opportunities  for  generally
 
applicable  
compiler  optimizations  based  on  properties  of  low
-­‐
level  operations.  High
-­‐
level  domain
-­‐
 
or  
library  specific  optimizations,  on  the  other  hand  are  costly  to  implement  and  integrate  into  a  
compiler  infrastructure,  and  thus  seldom  justified.  This  pr
oject  strikes  a  balance  between  these  
two  approaches,  and  offers  an  economical  approach  to  high
-­‐
level  optimizations.
 
The  research  team's  work  on  structuring  high
-­‐
level  optimizations  leverages  the  principles  of  
generic  programming,  in  particular,  the  catego
rization  of  types  into  concepts  according  to  their  
capabilities.  Defining  algorithms  in  terms  of  concepts  gives  rise  to  generic  algorithms  that  can  
operate  on  objects  of  many  different  types.  Defining  optimizations  in  terms  of  concepts  
similarly  gives  rise
 
to  generic  optimizations  that  apply  to  operations  over  many  different  types.
 
To  enable  the  expression  of  generic,  concept
-­‐
based  optimizations,  an  NSF
-­‐
funded  research  
team  
had  worked  toward
 
direct  language  support  for  generic  programming  for  C++.  This  work
 
resulted  in  the  "ConceptC++"  extensions  to  C++,  designed  together  with  many  collaborators.    
Utilizing  ConceptC++,  and  the  ConceptGCC  compiler  developed  by  Doug  Gregor  within  this  
project.  Building  on  these  achievements,  the  research  team  developed  a  gener
ic  simplifier,  
whose  transformations  are  guided  by  concepts  and  "axioms"  contained  within  them.    The  team  
also  devised  two  prototype  languages  for  writing  compiler  optimizations  (and  analyses)  
generically  and  thus  reusing  them  across  different  types.
 
Broad
er  Impact:  
 
Software  is  important  in  most  of  all  aspects  of  modern  life;  improvements  on  software  
development  methods  and  tools  that  make  it  easier  to  develop  more  efficient  software  thus  
translate  to  benefits
 
to  society  

 
from  scientific  research,  to  busi
ness  and  economics.
   
The  work  
in  this  project  directly  impacts  the  future  development  of  mainstream  programming  languages  
 
 
19
 
that  support  generic  programming,  C++  in  particular,  and  on  their  standard  libraries.  The  
research  team  members  participate  in  program
ming  language  standards  bodies,  collaborate  
with  language  and  compiler  implementers,  and  work  to  introduce  language  features  that  better  
support  generic  programming.
 
The  project  has  directly  trained  graduate  students  and  post
-­‐
doctoral  researchers  in  the  
em
erging  field  of  generic  programming,  and  will  continue  to  do  so  in  the  future.    Results  from  
this  research  are  integrated  into  graduate  programming  courses.  Overall,  the  project  advances  
discovery  and  understanding  by  relieving  researchers  and  software  pro
fessionals  to  focus  more  
on  the  solutions  to  their  scientific  and  development  problems.
 
This  work  also  created  infrastructure,  namely  concept
-­‐
enabled  compilers,  that  will  enable  
future  research  in  generic  programming.    The  research  team  has  published  sever
al  papers  on  
work  performed  within  the  project,  as  well  as  giving  various  presentations  at  academic  
conferences,  and  made  software  artifacts  available  for  others  to  use.
 
Transformative  Nature  of  Research:  
 
Today's  mainstream  compilers  are  considered  impene
trable  "black  boxes"  to  programmers.  
Producing  programs  that  perform  efficiently  may  often  require  iteratively  modifying  the  
program  in  small  ways  to  coerce  the  compiler  into  producing  a  fast  executable.  Optimizations  
are  not  under  the  control  of  the  progr
ammer.  With  the  approach  researched,  developed,  and  
advocated  in  this  project,  programmers  are  given  this  control.  The  end  result  is  that  the  
iterative  tuning  process  can  be  drastically  reduced  and  thus  sped  up,
 
translating  into  greatly  
increased  programming  productivity.    Moreover,  when  programmers  no  longer  need  to  
abandon  their  high
-­‐
level  abstractions  to  obtain  performance,  it  is  possible  to  express  more  
complex  problems.    Ultimately,  this  can  transform  the  wa
y  in  which  programmers  approach  
optimizing  their  programs  and  their  libraries.
 
Projects  Achieving  Major  Milestones  during  the  Reporting  Period
 
Coordinated  Fault  Tolerance  for  High  Performance  Computing
 
Funding  Agency:  U.S.  Department  of  Energy
 
 
We  have  foc
used  our  efforts  in  Open  MPI  on  reliability  improvements,  and  expanding  support  
for  the  CIFTS  Fault  Tolerance  Backplane  (FTB).  As  part  of  the  reliability  improvements,  we  
matured  the  process  fault  recovery  operations  to  support  run
-­‐
through  stabilization,  r
eactive  
automatic  recovery,  and  proactive  process  migration.  The  former  option  supports  continuing  
research  into  fault  tolerant  MPI  semantics  and  applications  that  can  continue  processing  even  
though  some  processes  may  have  failed.  At  Supercomputing  2009,  
we  demonstrated  a  fault  
tolerant  version  of  POV
-­‐
Ray  using  Open  MPI's  stabilization  feature  and  the  CIFTS  FTB.  The  
proactive  process  migration  feature  allows  end  users  to  move  processes  away  from  predicted  
failure  and  planned  system  outages.  The  reactive  au
tomatic  recovery  feature  provides  end  
users  with  a  transparent,  automatic  recovery  mechanism  when  an  unexpected  process  failure  
occurs.  As  part  of  our  expanding  support  for  the  CIFTS  FTB,  we  have  improved  the  internal  error  
 
 
20
 
reporting  mechanisms  by  adding  a
 
stable  reporting  interface,  called  OPAL  SOS,  which  can  report  
directly  to  the  FTB.  Additionally,  we  have  been  collaborating  with  CIFTS  FTB  partners  to  
standardize  fault  events  and  workflows  to  enhance  the  overall  resiliency  of  HPC  systems  by  
encouraging  a
doption  of  the  FTB.  Alongside  this  work,  we  added  support  for  
checkpoint/restart
-­‐
based  parallel  debugging  in  Open  MPI  that  can  dramatically  shorten  the  
debugging  cycle,  saving  software  developers  hours  or  days  of  time  spent  debugging.
 
 
Open  Source  Cluster  
Application  Resources
 
(OSCAR)
 
 
We  have  restructured  the  main  frame  of  OSCAR  since  version  6.0.x  to  make  the  system  more  
reliable  and  to  enable  the  developers  to  participate  in  the  programming  with  the  less  learning  
curve  of  the  codes.  The  renovation  of  the
 
OSCAR  main  frame  is  done  and  trunk  of  the  OSCAR  
SVN  repository  is  stabilized.  As  OSCAR  6.0.x  promised,  'yum  install  oscar'  works  with  the  OSCAR  
specific  repository  setup  on  OSCAR  6.0.5.  Meanwhile,  we  still  have  to  test  all  the  features  of  the  
new  release  
depending  on  the  OSCAR  communities'  help  and  we  really  need  to  find  a  way  to  
test  OSCAR  systematically.  We  believe  that  the  systematic  testing  should  be  considered  in  the  
new  release  even  though  this  has  nothing  to  do  with  the  new  features  of  OSCAR.  As  the
 
usual  
OSCAR  release,  we  will  be  able  to  focus  on  supporting  more  distros  and  platforms  by  the  
systematic  testing.  We  support  RHEL5(X86,  X86
-­‐
64),  Debian(X86,  X86
-­‐
64),  and  Ubuntu(X86,  
X86
-­‐
64)  so  far.  
 
 
Development  and  Improvement  of  a  Tissue
-­‐
Simulation  
 
Funding  Agency:  National  Institutes  of  Health
 
 
The  OSL  is  collaborating  with  the  Biocomplexity  Institute  at  Indiana  University  to  provide  an  
open
-­‐
source,  multiscale  modeling  environment  for  cell
-­‐
based  modeling  of  the  development,  
structure,  behavior  and  pa
thologies  of  tissues  and  organs,  the  Tissue
-­‐
Simulation  Environment  
(TSE),  as  one  such  platform.  The  TSE  will  build  upon  the  current  Cellular  Potts  Model
-­‐
based  
modeling  environment,  CompuCell3D,  and  the  Systems  Biology  Workbench  to  allow  simple  
model  develo
pment  by  both  modelers  and  experimentalists,  provide  a  framework  for  model  
sharing,  support  SBML  and  CellML  and  allow  transparent  selection  of  the  level  of  modeling  
detail.  The  software  will  include  graphical  user  interfaces  and  support  for  parallel  comput
ing.
 
 
During  this  reporting  period,  we  were  able  to  develop  and  demonstrate  a  solution  for  
performing  parameter  studies  of  CompuCell3D  models  that  combined  workflows  and  IU’s  Big  
Red  supercomputer  to  perform  the  simulations.    The  open  source  software,  VisT
rails    
(vistrails.org),  was  used  to  construct  the  workflows  and  handle  data  provenance.    One  workflow  
automatically  constructed  multiple  sets  of  parameter  values  and  remotely  invoked  (via  Globus  
Toolkit)  simultaneous  CompuCell3D  jobs  on  Big  Red.    Another  w
orkflow  retrieved  the  resulting  
output  data  and  rendered  images  in  VisTrails.    This  project  offered  a  valuable  alternative  to  the  
traditional,  workstation
-­‐
based  CompuCell3D  application.
 
 
 
21
 
         
 
Portion  of  VisTrails
 
workflow  (left)  and  resulting  spreadsheet  of  cell  sorting  simulations  (right)  from  a  parameter  
study  run  on  Big  Red.
 
 
 
Causal  Connectivity  and  Computations  in  Hundreds  of  Neurons  in  C
ortex
 
Funding  Agency:    National  Science  Foundation
 
 
The  OSL  is  collabora
ting  with  John  Beggs  (Physics,  IU)  to  determine  “causal”  connectivity  
between  hundreds  of  neurons  in  cortical  networks  and  determine  computational  operations  in  
neurons  where  causal  connections  converge.  Causal  connectivity  has  been  conceptualized  in  
many  
ways,  but  this  project  adopts  the  definition  given  by  Norbert  Wiener:  “
For  two  
simultaneously  measured  signals,  if  we  can  predict  the  first  signal  better  by  using  the  past  
information  from  the  second  one  than  by  using  the  information  without  it,  then  we  ca
ll  the  
second  signal  causal  to  the  first  one
.”  In  this  sense,  the
 
field  does  not  use  the  term  “causal”  
literally,  but  to  indicate  predictive  value.  As  a  directional
 
measure,  causal  connectivity  cannot  
be  deduced  merely  from  non
-­‐
directional  measures  like  co
rrelations  or  firing  rates.
 
During  this  reporting  period,  we  have  obtained  some  initial  data  from  the  experimentalists  and  
have  begun  developing  software  applications  for  visualization  and  analysis.    Our  goal  is  to  
provide  open  source  tools  that  neuroscien
ce  researchers  are  able  to  freely  download,  use,  and  
extend.    As  the  datasets  grow  in  size,  we  will  want  to  apply  high
-­‐
performance  computing  to  the  
analysis.
 
 
The  open  source  ParaView  application  to  visualize  neuron  firing  data.
 
 
 
 
22
 
 
Movie  frames  of  neuron  
firing:  frames  N  and  N+3  (3  ms  later),  depicting  possible  correlation  between  one  neuron  and  
adjacent  neurons  (yellow  circle).
 
 
A  Declarative  Approach  to  Managing  the  Complexity  of  Massively  Parallel  Programs
 
Funding  Agency:  National  Science  Foundation
 
 
Our  current  focus  is  on  identifying  and  experimenting  with  declarative  abstractions  that  make  it  
easier  to  write  parallel  codes,  especially  when  programming  in  the  Bulk  Synchronous  Parallel  
(BSP)  style.    We  have  continued  to  explore  the  parallel  algorithms
 
exemplified  by  the  thirteen  
Berkeley  Dwarfs,  and  have  examined  how  their  communication  patterns  might  be  expressed  
declaratively.    We  recently  finished  the  preliminary  design  of  Kanor,  our  declarative  parallel  
programming  language;  Kanor's  declarative  com
munication  constructs  are  based  on  list  
comprehensions  and  array  slices.    We  have  implemented  a  prototype  of  Kanor  as  a  C++  
template  library,  and  have  begun  porting  the  Berkeley  Dwarfs  from  MPI  to  Kanor.    Our  initial  
experience  is  that  Kanor  communication  
code  is  shorter  and  simpler  than  the  MPI  equivalent,  
at  least  for  codes  written  in  the  BSP  style.
 
 
II.2.4  
Complex  Networks  and  Systems  Group  (Alex  Vespignanni,  Director)
 
The  Complex  Networks  and  Systems  Group  (CNetS)  is  hosted  at  the
 
IU
 
School  of  Informati
cs  
and  brings  together  faculty  from  different  units  across  campus  working  in  the  broad  areas  of  
complex  networks  and  systems
.  
The  center  activities  include  modeling  and  mining  of  complex  
information,  technological  and  social  networks,  agent
-­‐
based  systems,  
computational  social  
sciences,  artificial  life,  computational  epidemiology  etc.  The  center  is  receiving  funds  by  the  Lilly  
Foundation  through  the  PTI,  NSF,  NIH  and  a  number  of  private  foundations  and  corporations.
 
 
Projects  Completed  During  Current  
Reporting  P
eriod
 
 
Designing  an  Effective  HIV  Prevention  Plan  for  Botswana  by  Coupling  an  Information  Network  
Model  with  a  Meta
-­‐
population  Transmission  M
odel
 
Principal  Investigator
:  Alessandro  Vespignani,  PI
 
Funding  Agency:  
University  of  California,  Los  Ang
eles  (UCLA)
 
 
 
23
 
Award  Number:
 
Subaward  
2000  G  MF  329
 
Award  Amount
:
 
$37,500
 
Effective  Dates:  
May  01,  2009
-­‐
April  30,  2010
 
Project  Summary
:
 
We  have  used
 
an  interdisciplinary  approach  to  design  a  novel  theoretical  framewo
rk,  based  on  
network  
science  that
 
will  aid  
in
 
developing  effective  health  policies  for  controlling  the  HIV  
epidemic  in  Botswana,  a  resource
-­‐
constrained  country  in  Sub
-­‐
Saharan  Africa.  Our  research  links  
mathematics,  physics,  epidemiology,  public  policy  and  public  health.  We  decided,  as  the  initial  
s
tage,  to  take  a  complex  biological  model  that  
the  collaborating  group  of  Professor
 
Blower  
published  in  Science  in  January  2010  (Science.  2010  Feb  5;327(5966):697
-­‐
701.)  and  to  add  a  
network  structure  linking  individuals  in  the  model.  We  then  plan  to  expand  
this  network  model  
so  that  it  reflects  heterosexual  transmission  and  apply  the  model  to  Botswana.  
 
The  
funding  that  we  were  awarded
 
from  the  National  Academies  Keck  Futures  Initiative  (NAKFI)
 
is  helping  us  collect  preliminary  results,  and  identify  new  rese
arch  directions.  Once  we  obtain  
preliminary  results  we  wil
l  seek  future  support  from  National  Institute  of  Allergy  and  Infectious  
Disease  (NIAID)
 
at
 
the  National  
I
nstitutes  of  
H
ealth  (NIH).
 
Intellectual  Merit
:
 
The
 
project  is  working  on  the
 
development  of  a  new  class  of  models  to  examine  the  effect  of  
network  dynamics  on  the  spread  of  drug
-­‐
resistant  strains  of  HIV.
 
The  intention  is  to  create  an  
understanding  of  how  and  where  the  virus  is  spreading  and  how  it  is  likely  to  spread  in  the  
future
 
in  order  to  create  a  targeted  approach  to  prevention  and  treatment  of  HIV.
 
Broader  Impact
:
 
Our  research  goals  are  only  achievable  through  an  interdisciplinary  collaboration  between  
specialists  in  very  different  fields.  The  collaboration  that  we  have  been  
able  to  form,  through  
NAKFI  funding,  is  truly  interdisciplinary  and  synergistic.
 
Tr
ansformative  Nature  of  Research
:
 
This  is  the  f
irst  approach  to  the  problem  that  will  contain  insights  based  on  our  new  
interdisciplinary  methodology  using  network  science.
 
I
t  can  potentially  change  how  HIV  is  
managed  in  Botswana.
 
Societal  Benefits
:
 
According  to  the  international  AIDS  charity,  AVERT,  Botswana  is  among
 
the  hardest  hit  places  
on  earth,  with  an  estimated  one
-­‐
in
-­‐
four  adults  living  with  HIV.  Average  life  expectancy
 
in  
Botswana  is  currently  less  than  forty  years.  Modeling  the  disease  in  Botswana  will  lead  to  
health  policies  that  will  save  lives.  Improving  the  techniques  used  in  modeling  and  predicting  
 
 
24
 
the  spread  of  infectious  disease  can  also  improve  treatment  and  pr
evention  of  disease  
worldwide.
 
How  N
e
twork  Structure  Gives  Rise  to  Dynamical  C
omplexity
 
Principal  Investigator,  Co
-­‐
PIs:    Larry  Yaeger  (PI),  Olaf  Sporns  (Co
-­‐
PI)
 
Funding  Agency:    National  Academies,  Keck  Futures  Initiative
 
Award  Number:    NAKFI  CS22
 
Award  Amo
unt
:
   
$50
,
000
   
 
Effective  Dates:  
May  1,  2009  

 
May  31,  2010
 
 
Project  Summary
:
 
 
We  are  applying  a  combination  of  an  information  theoretic  measure  of  neural  complexity  and  
network  science  /  graph  theoretical  tools  to  the  neural  dynamics  and  network  topologie
s  of  
artificial  neural  networks  evolved  to  control  agents  in  a  computational  ecosystem.
 
This  research  
will  be  useful  in  developing  new  and  better  types  of  artificial  intelligence.
   
The
 
specific
 
goal  is  to  
understand  the  relationship  between  network  structu
re  and  network  function,  in  general,  and,  
specifically,  to  determine  which  structural  characteristics  are  most  predictive  of  and  most  likely  
to  confer  dynamical  complexity  in  artificial  neural  networks.    We  have  demonstrated  a  
relationship  between  increasi
ng  clustering  coefficient,  decreasing  path  length,  and  a  bias  
towards  small
-­‐
world  networks  that  is  directly  correlated  with  increasing  neural  complexity  
during  a  period  of  behavioral  adaptation  to  the  environment.    This  suggests  a  convergence  
between  evolu
tion  for  network  functionality  and  previously  elucidated  evolution  for  physical  
constraints,  such  as  wiring  length  and  brain  volume.
 
 
Intellectual  Merit:    
 
We  combine  a  sophisticated  artificial  life  model  (Polyworld)  with  a  powerful  collection  of  graph  
the
oretical  tools  (the  Brain  Connectivity  Toolbox)  and  the  gold  standard  information  theoretic  
complexity  metric  (Tononi,  Sporns,  Edelman).    The  computational  model  has  been  designed  so  
as  to  force  natural  selection  to  evolve  the  statistics  of  network  connect
ivity  rather  than  specific  
network  designs,  and  records  all  network  topologies  and  neural  dynamics.    By  evolving  the  
agents  in  an  environment  with  heterogeneous  resources  we  are  able  to  identify  periods  of  
behavioral  adaptation  to  the  environment  as  the  po
pulation  approaches  an  Ideal  Free  
Distribution,  and  focus  our  attention  to  complexity  growth  and  changes  in  network  topology  
during  these  periods.
 
Broader  Impact:    
 
We  have  developed  a  C++  version  of  the  (MATLAB)  Brain  Connectivity  Toolbox  (BCT),  speeding  
it  up  by  approximately  a  factor  of  30.    It  is  available  at  
http://code.google.com/p/bct
-­‐
cpp/
.    We  
have  also  provide
d  wrappers  for  Python  calls  into  this  library,  and  expect  to  provide  wrappers  
for  other  languages  in  the  future.    There  are  a  substantial  number  of  users  of  the  original  BCT,  
 
 
25
 
ranging  from  its  intended  purpose  of  neural  network  analysis  to  the  design  of  a  l
ens  for  a  space  
telescope,  so  we  expect  to  have  a  significant  impact  on  the  broader  scientific  community.
 
Transformative  Nature  of  Research:    
 
Improvements  in  our  understanding  of  the  relationship  between  network  structure  and  
network  function  may  impact  m
any  fields  of  science.    Knowledge  of  the  specific  topological  
features  associated  with  high  dynamical  complexity  may  allow  us  to  shape  the  search  space  for  
evolution  in  such  a  way  as  to  promote  higher  levels  of  artificial  intelligence  in  shorter  
timeframes
.
 
Societal  Benefits:    
 
The  dynamics  of  social  networks  are  being  used  to  illuminate  everything  from  online  
communication  to  file  sharing  to  the  spread  of  disease.    Though  our  work  is  not  currently  
targeted  at  diagnosis,  it  is  possible  that  structural  
breakdowns  resulting  in  neurological  disorder  
could  be  better  discovered  and  understood  given  the  insights  we  are  generating.
 
 
Projects  Achieving  Major  Milestones
 
 
Global  Epidemic  and  Mobility  Model
 
Funding  Agency:  National  Institutes  of  Health
,  U.S.  Defen
se  Threat  Reducation  Agency
,  Abbott,  
ISI  Foundation
 
 
The  Global  Epidemic  and  Mobility  (GLEaM)  model  provide  real  time  forecast  on  the  unfolding  of  
the  H1N1  epidemic  worldwide.  This  modeling  effort  has  been  unique  as  it  has  been  the  only  
one  attempting  to  o
btain  quantitative  results  worldwide.  The  necessity  to  provide  new  way  to  
obtain  real  estimates  for  the  disease  parameters  have  pushed  the  team  to  work  on  a  new  
methodology  that  perform  a  likelihood  analysis  of  the  model  with  respect  to  chronological  data  
of  the  diffusion  processes.  This  methodology  allowed  us  to  obtain  early  estimates  of  the  
transmission  potential  of  the  H1N1  virus  by  taking  advantage  of  the  multi
-­‐
scale  diffusion  
processes  defined  by  the  population  mobility  networks.  This  is  the  only  model
 
coupling  
countries  worldwide  and  this  feature  is  extremely  relevant  in  evaluating  the  time  pattern  of  
emerging  infectious  diseases.  The  early  results  have  been  validated  with  a  posteriori  analysis  
with  the  real  data  collected  by  the  CDC  in  the  months  of  M
ay  and  June.  The  agreement  
between  the  predictions  and  the  actual  unfolding  of  the  pandemic  has  been  proven  to  be  
remarkable.  The  GLEaM  approach  has  then  been  used  to  provide  in  the  month  of  June  and  July  
long  term  prediction  of  the  occurrence  of  the  epide
mic  activity  peak  in  the  Northern  
hemisphere  countries  in  the  winter.  The  method  anticipated  an  early  peak  occurring  in  
October/November  in  most  of  the  countries.  The  predictions,  of  a  quantitative  nature  (peak  
week  and  relative  95%  reference  range),  have  
been  published  in  early  September  on  BMC  
Medicine.  This  is  the  only  paper  so  far  that  has  attempted  a  quantitative  forecast  of  the  activity  
peaks.  The  predictions  contained  in  the  paper  have  been  
validated  since
 
January  2010  against  
the  real  data  provided  
by  agencies  of  more  than  40  countries.  The  results  show  a  very  good  
 
 
26
 
agreement  between  predictions  and  real  data  with  offset  of  at  most  two  weeks.  These  findings  
provide  a  strong  and  remarkable  test  of  the  quantitative  level  of  the  prediction  offered  by  
com
putational  methods.
 
 
.
 
Caption:  Epidemic  activity  world  wide  on  Oct  26,  2009  according  to  the  GLEaM  
computational  platform.  The  color  scale  indicates  the  number  of  infected  people.
 
 
 
 
 
 
 
27
 
II.3  
Educational  Activities  and  Workforce  D
evelopment
 
The  following  
students
 
from  the  Digital  Science  Center
 
completed  degrees  during  the  reporting  
period.
 
Student  Name
 
Degree  Type
 
Lab
 
Sashikiran  Challa
 
MS  in  C
heminformatics
 
Community  Grids  Lab
 
Jun  Ji
 
MS
 
in  Computer  Science
 
Community  Grids  Lab
 
Karthik  Muthuraman
 
MS  in  B
ioinformatics
 
Community  Grids  Lab
 
Jaliya  Ekanayake
 
PhD  in  Computer  Science
 
Community  Grids  Lab
 
Tak
-­‐
Lon  Wu
 
MS  in  Computer  Science
 
Community  Grids  Lab
 
Mark  Meiss
 
PhD  in  Computer  Science
 
CNetS
 
Diep  Hoang
 
MS  in  Computer  Science
 
CNetS
 
Matthew  Whitehead
 
PhD
 
in  Computer  Science
 
CNetS
 
Prabhanjan  Kambadur
 
PhD  in  Computer  Science
 
Open  System
s  Lab
 
 
The  following  chart  shows  employees  hired  or  termin
ating  the  Digital  Science  Center
 
during  the  
reporting  period.
 
Name
 
Lab
 
Status
 
Fugang  Wang
 
C
ommunity  Grids  Lab
 
Hired
 
Andrew  Younge
 
Community  Grids  Lab
 
Hired
 
Quenrui  Cai
 
Community  Grids  Lab
 
Hired
 
Scott  Beason
 
Community  Grids  Lab
 
Terminated
 
Adam  Hughes
 
Community  Grids  Lab
 
Hired
 
John  McCurley
 
CNetS
 
Hired
 
Snehal  Patil
 
CNetS
 
Hired
 
Ying  Wang
 
CNetS
 
Hired
 
Torsten  
Hoefler
 
Open  Systems  Lab
 
Left  OSL  to  work  for
 
Blue  Waters  
Directorate
 
National  Center  for  
Supercomputing  Applic
ations
 
 
University  of  Illinois  at  Urbana
-­‐
Champaign
 
William  Byrd
 
Open  Systems  Lab
 
Hired  as  Post  Doc  Associate
 
 
 


 
 
 
28
 
 
III