Potential use of Cloud computing for streamlining the processing of MT data Prof J Craig Mudge FTSE

chirpskulkInternet and Web Development

Nov 3, 2013 (3 years and 7 months ago)

95 views

Potential use of Cloud computing

for streamlining the processing of
MT data


Prof J Craig Mudge FTSE

Collaborative Cloud Computing Lab
(C3L)


New eScience Lab


enabled by

cloud computing

Seed funding from

--

minerals and geothermal research at
www.pir.sa.gov.au

--

Microsoft Research USA Jim Gray Seed Grant

JCM 30 Sept 2010

2

Prof Graham
Heinson


Prof J Craig Mudge FTSE

Stephan
Thiel

Pinaki

Chan

Jared Peacock

Wei Wang

Andrew
Wendelborn

Acknowledgements:

David Giles, Richard Lane, Tim Baker, Tristan Wurst

Magnetotelluric (MT) imaging

1.
Using the magnetic and electric
fields of the earth, MT imaging
determines the resistivity
structure of a sub
-
surface area of
interest.

2.
It goes deeper (hundred or so Km)
than seismic (<2 Km) but does not
have the same resolution

3.
Applications

1.

mineral exploration,

2.
water management in mining,

3.
geothermal exploration,

4.
carbon storage,

5.
aquifer research and management

6.
earthquake and volcano studies.

craig.mudge@adelaide.edu.au

27 sep 2010


CO
2

in depleted gas field

(Heinson and Mudge, 2010)

3

Overview of cloud computing

4

Ahead for research in minerals and
energy

1.
Data deluge






25 Terabytes per day 700MB of data per second,


60TB/day, 20PB/year
Petabytes

per day

2. Computation, e.g., rapid inversion


3. Data and experiments:
curation
, provenance, sharing,
reuse

5

Gene sequencers Large Hadron Collider Square Kilometre Array

6

Approx 100,000 PCs in Google data centres

Google Goose
-
Creek Google Dalles Oregon

From
www.cloudinnovation.com.au


7

Essence of cloud

1.
software as a service


applications are delivered
over the Internet with a common
-
or
-
garden browser

2.
significant cost savings, factors of 5x


7x

3.
presented as a utility


with a matching business model, namely pay
-
per
-
use




4.
a new data
-
parallel programming framework

8

Cost savings in warehouse
-
sized data centres


1.
resources in massive warehouse
-
sized data
centres

are pooled at scale,

2.
built from low
-
cost commodity chips and
disks


(run time environment of
MapReduce
, Dryad takes
care of fault tolerance,
scheduling, and
load
balancing)

3.
share the overhead of cooling, refrigeration,
physical security, and backup power
,

M

Execution of MapReduce

The Map step is shown as




in the following slides

(Dean and

Ghemawat,

2004)

9

Decomposition


Task

decomposition


How can a problem be decomposed into tasks
that can execute concurrently?


Data

decomposition


How can a problem's
data

be decomposed into
units that can be operated on relatively
independently?

then dependencies among the tasks


Group tasks, Order tasks, and Data Sharing

10

Parallel execution of MT data
-

one per
station

M

M

M

M

M

Sort by key

R

R

R

Station 1

Station n

11

Parallel execution of gridded exploration data

M

M

M

M

M


by using sub grids when the original is too big to do as one grid

Form sub
grids


Concrete example: Map step is an existing

MatLab program running on Amazon EC2

R

R

R

Re
-
combine

12

Potentially energy scavenging,
too


Water: Data collection, management, and analysis in
the cl
oud



Data integration/ Data use


data fusion

Organisations

(water, government,

regulators, market

operators, and


researchers) will

mine this data.


Data clean

Data analysis

Data repurpose


Visualisaton








Wireless ad
-
hoc networks


-

mesh networked motes


with sensors


Sensors
--

10 years

On 2 AA batteries

40 mm

gateway

Metadata and databases of interest

River data:

from

sensors (both

mobile , moored)


Existing data bases

Weather

Aquifer

River

Irrigation


Remote sense


(satellite)


Historical photos


etc

Data collection, aggregation

-

high volumes of complex heterogeneous data


www.pacific
-
challenge.com

Craig Mudge 13.9.2010 24.9.2010

satellite

Academy Working Group

Cloud computing at
peta
-
scale

1.
Alex
Zelinsky
,
CSIRO Group
Executive,
17
May
2010
“The
Academy project has been a real catalyst for
getting the cloud computing agenda moving
forward in Australia
.”


2. Summer internships


cloud
computing


$1,000 prize won by
Jinhui

Yao for his
security project in an internship
hosted by CSIRO


3. Report to be launched October 14
in Canberra

14

www.cloudinnovation.com.au

NBN: fiber/wireless net
connecting mobile and
fixed clients to a cloud
computing infrastructure
for applications & content

Cloud
Computing:

Services &
Content

Mobile Clients

Fixed
Clients &
Client Nets

N
B
N

Television Content

Computer person’s view of NBN:

“Continuous Services i.e. apps &
Client Connected Devices”

Cloud
Computing:

Services &
Content

Mobile Clients

Connected Devices

Fixed
Clients &
Client
Nets

N
B
N

Television Content

Our cloud service providers

Google

Amazon

Microsoft
Azure

Drop box

Document

sharing


Computation

Computation

Document
sharing

Calendar

Storage

Code re
-
use

Search

Search

18

Our first application domain,
magnetotellurics



19

Magnetotelluric (MT) imaging

1.
Using the magnetic and electric
fields of the earth, MT imaging
determines the resistivity
structure of a sub
-
surface area of
interest.

2.
It goes deeper (hundred or so Km)
than seismic (<2 Km) but does not
have the same resolution

3.
Applications

1.

mineral exploration,

2.
water management in mining,

3.
geothermal exploration,

4.
carbon storage,

5.
aquifer research and management

6.
earthquake and volcano studies.

craig.mudge@adelaide.edu.au

27 sep 2010


CO
2

in depleted gas field

(Heinson and Mudge, 2010)

20

Clean

Broadband processing

E field conversion to standard units

inspect

with

GMT

plots

Forward

Modelling

and

Inversion




Station 1


Station n


Station 2


MT Station data


from logging


in the field

Outputs from
BIRRP are
(a)impedance
Z, where E=ZB


(b)coherence
data

(c) Apparent
resistivity


and phase


BIRRP

Convert
to EDI

Convert
to EDI

Convert
to EDI

21

Time series

Apparent resistivity

Forward model and inversion


Start



Compute MT response

of new model


Compare Model response

and MT observed data

Update model

N

Required

misfit?

Exceeded


max # of


iterations?

N

Y

Y

<


<

MT Processing

1.
Time series data from
stations

Remove outliers

-
To frequency domain

-
Apply BIRRP (
Chave
,
Thomson 1989



(robust METHOD)

Produces resistivity


by frequency
and phase


2.
Inversion to produce
subsurface image



(
Siri

2005)




~ 24 hours




~3 to 4 weeks


for 3D

Chave and Thompson Bounded influence magnetotelluric response function

estimation. Geophys. Jnl. Int. 1989

Siripunvaraporn, Egbert, Lenury, and Uyeshima. Three dimensional magnetotelluric

inversion: data
-
space method. Physics of the Earth and Planetary Interiors 150. 2005

Currently

Reflections


September 2010

Value of cloud for PIRSA, our MT processing, and CRC DET

1.
Access to cheap flexible computing

1.
Amazon runs Fortran,
Matlab
, Python, etc. E.g., T
Dhu’s

gridded execution

2.
On
-
demand purchase of a couple of hours of a more powerful computer (generally in
memory


8
Gbytes
, for example); pricing is growing in sophistication


spot pricing, micro
-

instances, etc.

2.
Parallel execution

1.
Easy to get concurrent execution of steps, e.g., 45 stations

2.
Parallel within a step (
Google’s
MapReduce

and Dryad/
LINQ) is hard work, but have made a
little progress

3.
Our future work
on integration in multi
-
layered data bases has
been strongly endorsed

Disappointments

Honours student gave up on Visualisation of sub
-
surface layers using Bing/Google Earth

eScience

workflow was a major contribution
(unexpected)

1.
Less human interaction, repeatable, provenance, sharing of workflows internationally

2.
Increasingly important, as volume of data grows

No machines Lab
: “built first cloud based server, which is the SVN

server for C3 Lab

in the Amazon EC2 cloud.


Craig Mudge 29.9.2010


24

3/12/09

Bill Howe, UW

25

Scientific Workflow Systems


Value proposition: More time on science, less time on
code, admin



How: By providing language
emphasizing
sharing, reuse, reproducibility, rapid
prototyping, efficiency



Provenance


Visual programming


Integration
with domain
-
specific tools


Scheduling


Data
curation

2010: Honours project


in Geophysics


Tristan Wurst


Steps in MT processing

Porosity Joint Inversion

Invert for a
single
parameter
,

to which both techniques
are sensitive

(Rachel Maier, 2010)

(Rachel Maier, 2010)

Renmark Trough

NE

SW

Seismic constrained Gravity

MT Inversion

Joint Inversion

28

Data Compute

and geologist’s

data integrations


Data logging with near real
-
time feedback

Sub
-
surface

Future areas

1.
Seismic

2.
Inversion and forward modelling in general

3.
Rapid inversion, too

4.
Data integration or data fusion


across multiple layers

5.
Data mining


29

30

Data Compute

and geologist’s

data integrations


Collaboration

Sensing


a dozen or more sensors

steering

Seismic

XRF

Resistivity

etc

drilling machine

control system


Geologist

in field

Seismic,

Satellite,

MT,

Petrophysical

Cores,

Density

etc


Sub
-
surface

A geologist steering a drill in real time, using real
-
time sensing

of the sub
-
surface and updating geological models,

while referring to her cloud
-
based data sets

and collaborating with her team

back home

Vision:

www.cloudinnovation.com.au


craig.mudge@adelaide.edu.au


0417 679 266

Searching the Deep Earth:
sustaining
your wealth for the next century


32

High Flyers

Think Tank


Canberra

19

20 Aug 2010






from draft report


...
nationally coordinated program

to deploy new geophysical tools

(magneto telluric, passive seismic)

and methods (geochemical)

integrated with

a comprehensive drilling program.

...

next, using petascale computing,

Storage, and network resources


these data will be integrated into

multi
-
dimensional databases ...

Searching the Deep Earth:
sustaining
your wealth for the next century


33

High Flyers

Think Tank


Canberra

19

20 Aug 2010






from draft report


...
nationally coordinated program

to deploy new geophysical tools

(magneto telluric, passive seismic)

and methods (geochemical)

integrated with

a comprehensive drilling program.

...

next, using petascale computing,

Storage, and network resources


these data will be integrated into

multi
-
dimensional databases ...

The Power Wall

34

www.pacific
-
challenge.com