cresiscyberinfrastructureapril2011 - Indiana University

gasownerΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 11 μήνες)

326 εμφανίσεις

Cyberinfrastructure

Geoffrey Fox

Indiana University

with

Linda Hayden

Elizabeth City State University

April 5 2011 Virtual meeting



Cyberinfrastructure


Supports the Expeditions with light weight field system


hardware and system support


Then perform offline processing at Kansas, Indiana and ECSU


Indiana and ECSU facilities and initial field work funded by NSF
PolarGrid MRI which is now (essentially) completed


Initial basic processing to Level 1B


Extension to L3 with image processing and data exploration
environment


Data is archived at NSIDC

Prasad Gogineni
With the on
-
site processing capabilities provided by PolarGrid, we are
able to quickly identify Radio Frequency Interference (RFI) related problems and
develop

appropriate

mitigation techniques. Also, the on
-
site processing capability
allows us to process and post data to our website within 24 hours

after a flight is
completed. This enables
scientific and technical personnel

in the continental United
States to evaluate the results and provide the field team with near real
-
time feedback on
the quality of the data. The review of results also allows us to re
-
plan and re
-
fly critical
areas of interest in a timely manner.

IU Field Support Efforts 2010


OIB Greenland 2010


RAID
-
based data back up solution


Second server to handle processing
needs


over 50TB collected on
-
site


copying to Data Capacitor completed at
IU in Feb 2011


OIB Punta Arenas 2010


20TB using same back up solution

IU Field Support, Spring 2011


OIB and Twin Otter flights
simultaneously, two
engineers in the field


The most equipment IU has
sent to the field in any
season


processing and data transfer
server at each site


two arrays at each field site


Largest set of data
capture/backup jobs yet
between CReSIS/IU

Field Equipment in detail


OIB Thule:
3 2U, 8 core servers, 3 24TB SATA arrays, 12
cases of 40 1.5TB drives


Illullissat:
2 2U, 8 core servers, 2 24TB SATA arrays, 6
cases of 40 1.5 TB drives


2010 Chile:
3 2U, 8 core servers, 3 24TB SATA arrays, 6
cases of drives


2010 Thule
-
to
-
Kanger:
1 2U, 8 core server, 1 24TB SATA
array, 6 cases of drives in Thule, 5 in Kanger. Drives in
Thule
-
to
-
Kanger were re
-
used drives from earlier Antarctic
work and 3 cases failed in Thule.


Note 100 drives failed in total so far (its harsh out there)

IU Lower 48 support


2010 data now on Data
Capacitor


Able to route around local
issues if necessary, by
substituting other local
hardware temporarily


Turnaround/management of IU affiliate accounts for CReSIS
researchers and students


Some tuning of Crevasse (major PolarGrid system at IU)
nodes for better job execution/turnaround complete

Education and
Cyberinfrastructure

Summer 2010 Cyberinfrastructure REU


Joyce Bevins
Data Point Visualization and Clustering Analysis
Mentors: Jong Youl Choi, Ruan Yang, and Seung
-
Hee Bae IUB


Jean Bevins
Creating a Security Model for SALSA HPC Portal
Mentors: Adam Hughes, Saliya Ekanayake IUB


JerNettie Burney and Nadirah Cogbill
Evaluation of Cloud Storage
for Preservation and Distribution of Polar Data
Mentors: Marlon
Pierce, Yu (Marie) Ma, Xiaoming Gao, and Jun Wang IUB


Constance Williams
Health Data Analysis
Mentor: Jong Youl Choi
IUB


Robyn Evans and Michael Austin
Visualization of Ice Sheet
Elevation Data Using Google Earth & Python Plotting Libraries
Mentors: Marlon Pierce, Yu (Marie) Ma, Xiaoming Gao, and Jun
Wang IUB



Academic Year Student Projects


A Comparision of Job Duration Utilizing High Performance
Computing on a Distributed Grid Members:
Michael Austin, JerNettie
Burney, and Robyn Evans Mentor: Je'aime Powell


Research and Implementation of Data Submission Technologies in
Support of CReSIS Polar and Cyberinfrastructure Research Projects
at Elizabeth City State University
Team Members: Nadirah Cogbill,
Matravia Seymore Team Mentor: Jeff Wood Mentors with Xiaoming
Gao, Yu "Marie" Ma, Marlon Pierce, Jun Wang at IU


JerNettie Burney, Glenn Koch, Jean Bevins, Cedric Hall
A Study on
the Viability of Hadoop Usage on the Umfort Cluster for the
Processing and Storage of CReSIS Polar
Data. Mentor: Je'aime
Powell


Other Education Activities


Two ADMI faculty, one graduate student and one
undergraduate student participated in the Cloud Computing
Conference CloudCom2010 in Indianapolis December 2010


Fox presented at ADMI Cloud Computing workshop for faculty
December 16 2011

Jerome Mitchell (IU PhD,
ECSU UG, Kansas
Masters) will describe
A
Cloudy View on
Computing Workshop
at
ECSU June 2011


Supporting Higher Level
Data Products


Image Processing


Data Browsing Portal from Cloud’


Standalone Data Access in the field


Visualization

Hidden Markov Method based Layer Finding

P. Felzenszwalb, O. Veksler, Tiered Scene Labeling with Dynamic Programming,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010

Current CReSIS Data Organization


The
data are organized by season.
Seasons
are
broken into data segments which are contiguous
blocks of data where the radar parameters do not
change.


Data
segments are broken into frames (typically
50 km in length
). Associated data for each frame
are stored in different file formats CSV
(flight
path), MAT (depth sounder data),
PDFs (image
products).


CReSIS data products website lists direct
download links for individual files.

PolarGrid Data Browser Goals


Organize the data files by its spatial attributes.


Support multiple protocols for different user
groups, such as KML service and direct spatial
database access.


Support efficient access methods in different
computing and network environments.


Cloud and Field (standalone) versions


Support high level spatial analyses functions
powered by spatial database


PolarGrid Data Browser Architecture


Two main components: Cloud distribution service
and special service for PolarGrid field crew.


Data syncopation is supported among multiple
spatial databases.

Google Earth

Matlab/GIS

GeoServer

Spatial Database

GIS Cloud Service

WMS

KML

Cloud Access

Field Access

SpatiaLite

SQLite Database

Field Service

Spatial Database

Virtual Appliance

Data Portal

Single User

Multiple Users

(local network)

Virtual Storage

Service

PolarGrid Data Browser:

Cloud GIS Distribution Service


Google
Earth example:
2009 Antarctica season


Left image: overview of 2009 flight paths


Right image: data access for single frame


Technologies in

Cloud GIS Distribution Service


G
eospatial
sever is based on
GeoSever

and
PostGreSQL

(spatial database), and configured
inside the Ubuntu
virtual
machine.


Virtual storage service
attaches terabyte
storage
to the virtual machine.


The Web Map Service (WMS) protocol enables
users to access the original data set from Matlab
and GIS software. KML distribution is aimed for
general users. Data
portal are built with Google
Map, and can be embedded into any website.




PolarGrid data distribution on
Google Earth


Processed on cloud using MapReduce

PolarGrid Field Access Service


Field crew has limited computing resource and internet
connection.


Essential data set are downloaded from Cloud
GIS
distribution
s
ervice, packed as spatial
database
virtual
appliance with
SpatiaLite
. The whole system can be carried
around on a USB flash drive.


Virtual appliance is built on Ubuntu JeOS (just enough
operating system), it has almost identical functions as GIS
Cloud service, works on local network with
VirtualBox
. The
virtual appliance runs with 256 M virtual memory.


SpatiaLite
database is
a light
-
weight spatial database based
on SQLite.
It aims at a single user;


the
data can be accessed through GIS
software,
and
a
native
API for
Matlab

has
also
been developed.

PolarGrid Field Access Service


SpatiaLite data access
with Quantum
GIS interface


Left image
: 2009 Antarctica s
eason vector data, originally
stored in 828 separate files.


Right
image: visual crossover
analysis for quality control
(work in progress)



Use of Tiled Screens

URL References


CReSIS data products:
https://www.cresis.ku.edu/data


GeoServer:
http://geoserver.org/


PostgreSQL:
http://www.postgresql.org/


VirtualBox:
http://www.virtualbox.org/


SpatiaLite:
http://www.gaia
-
gis.it/spatialite/


Quantum GIS:
http://www.qgis.org/