Calvalus

homelybrrrInternet και Εφαρμογές Web

4 Δεκ 2013 (πριν από 3 χρόνια και 8 μήνες)

74 εμφανίσεις

CALVALUS



FULL

MISSION EO CAL/VAL, PROCESSING
AND

EXPLOITATION

SERVICES


NORMAN
FOMFERRA
,
MARTIN BOETTCHER
,

MARCO
ZUEHLKE
, CARSTEN
BROCKMANN

BROCKMANN

CONSULT

GMBH

1

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Calvalus








Full mission EO cal/
val

processing and exploitation
services

2

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Outline


Objectives and achievements


Apache
Hadoop

in five slides


Calvalus

=
Hadoop

for EO


Calvalus

bulk processing

3

* Sixth Symposium on Operating System Design and Implementation; San Francisco, CA, 2004

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Jeffrey Dean and

Sanjay
Ghemawat
,

Google, 2004:


MapReduce
: Simplified

Data Processing on

Large Clusters” *


exploit easily full mission EO archives


have a powerful and affordable multi
-
mission
processing infrastructure


generate products using full mission datasets, with
new algorithms and algorithm versions


aggregate results in temporal and spatial dimension


test new ideas in a rapid prototyping approach


have a tool to perform calibration and validation on
full mission archives as the basis for reliable
scientific conclusions

There was a dream …

4

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012


Robust production

Calvalus

for Land Cover CCI Pre
-
processing

5

Generation of 7
-
day composites of
surface reflectance from full
mission MERIS FRS and RR

for
CCI Land Cover is a data and
computing intensive automated job
that runs for
3 months
on a 72
nodes
Calvalus
/
Hadoop

cluster

Quicklook

generation
for full mission
MERIS FRS and RR reads and
processes
150 TB
input data

in
10 hours
. This is about 50
Gbit
/s.


Other full mission processes are
between
these

two

times
.

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Projects using
Calvalus


ESA
CoastColour
: 6 years MERIS FR, 27 regions


ESA Land Cover CCI
: pre
-
processing, full mission
weekly L3 from MERIS and SPOT VGT


ESA Ocean Colour CCI
: algorithm improvement
cycle, MODIS,
SeaWiFS
, MERIS


GlobVeg
: global FAPAR and LAI from MERIS


Prevue
: MERIS full mission subset extraction


Fronts
: MERIS detection of fronts


Diversity II: bio
-
diversity of lakes and
drylands

6

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Hadoop

= HDFS + jobs/tasks +
MapReduce

Archive
-
centric approach


Network storage


data are transferred on

the network


risk of network bottleneck

7

Direct, data
-
local processing

Compute cluster

Network data archive

Hadoop

approach


data
-
local processing


tasks are transferred on
the network


good scalability

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Cluster hardware and network


standard
hardware


Calvalus

additions for
I/O and
development

8


node 1


local disk


node 2


local disk


node n


local disk


...

master

feeder

external

data source

or destination

test server

test 1

test 1

test 1

vm1


node 3


local disk


node 4


local disk

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Hadoop

Distributed File System

9


distributed file system HDFS

on local disks of compute nodes


transparent, optimised data
-
local

access


data replication


automated recovery


continued service

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Hadoop

Job
Scheduling

10


flexible granularity of inputs defined by split functions
(for EO: one file


one split)


massive parallel processing, task pull


takes failure into account, automated re
-
attempt,
optional speculative execution


job queues, priorities, fair sharing among projects

Job

Input set

Task

Input split

Task

Input split

Task

Input split

Task

Input split

Task

Input split

data
-
local

processing

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

500 .... 50000

Parallel aggregation with
MapReduce


data
-
local access of inputs


a well
-
selected sorting and partitioning function


generation of the output in parts that can be simply concatenated


11

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

CALVALUS

=
HADOOP

FOR EARTH
OBSERVATION

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

L2 Bulk Processing Realisation


MERIS RR L1, North Sea, 3 days


CoastColour

NN L2 processor


6 minutes
(22 nodes)


output: L2 files

L1 File

L2 Processor

(Mapper Task)

L2 File

L1 File

L2 Processor

(Mapper Task)

L2 File

L1 File

L2 Processor

(Mapper Task)

L2 File

L1 File

L2
Processor

(Mapper Task)

L2 File

L1 File

L2
Processor

(Mapper Task)

L2 File

13

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Match
-
up Analysis Realisation


MERIS RR L1, global, 3 months


CoastColour

C2W
processor


NOMAD in
-
situ dataset


6 minutes
(22 nodes)


Scatter
-
plots and pixel extraction



L1 File

L2 Proc. & Matcher

(
Mapper

Task)

OutpRecs

L1 File

L2 Proc. & Matcher

(
Mapper

Task)

OutpRecs

L1 File

L2 Proc. & Matcher

(
Mapper

Task)

OutpRecs

L1 File

L2 Proc. & Matcher

(
Mapper

Task)

OutpRecs

L1 File

L2 Proc. & Matcher

(Mapper Task)

OutpRecs

MA Output Gen.

(Reducer Task)

Inp Recs

MA Report

14

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

L2/L3 Processing Realisation


MERIS RR L1, global, 10
-
day


CoastColour

C2W
processor


1.5 hours
(22 nodes)


1 L3 product


L3 Temp. Binning

(Reducer Task)

Spa.Bins

L1 File

L2 Proc. & Spat.
Binning

(
Mapper

Task)

L1 File

L2 Proc. & Spat.
Binning

(
Mapper

Task)

Spat.Bins

L1 File

L2 Proc. & Spat.
Binning

(
Mapper

Task)

Spat.Bins

L1 File

L2 Proc. & Spat.
Binning

(
Mapper

Task)

Spat.Bins

L1 File

L2 Proc. & Spat.
Binning

(Mapper Task)

Spat.Bins

L3 Temp. Binning

(Reducer Task)

L3 File(s)

Temp.Bins

Temp.Bins

L3 Formatting

(Staging)

11

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Trend Analysis Realisation


MERIS RR L1, South Pacific Gyre, 2002
-
2010, first 4 days of a month


CoastColour

C2W processor


30 minutes
(22 nodes)


Time
-
series plots and data


L3 Temp. Binning


Spat.Bins

L1 File

L2 Proc. & Spat.
Binning

(
Mapper

Task)

L1 File

L2 Proc. & Spat.
Binning

(
Mapper

Task)

Spat.Bins

L1 File

L2 Proc. & Spat.
Binning

(
Mapper

Task)

Spat.Bins

L1 File

L2 Proc. & Spat.
Binning

(
Mapper

Task)

Spat.Bins

L1 File

L2 Proc. & Spat.
Binning

(Mapper Task)

Spat.Bins

L3 Temp. Binning

(Reducer Task)

Temp.Bins

Temp.Bins

L3 Temp. Binning


Spat.Bins

L1 File

L2 Proc. & Spat.
Binning

(
Mapper

Task)

L1 File

L2 Proc. & Spat.
Binning

(
Mapper

Task)

Spat.Bins

L1 File

L2 Proc. & Spat.
Binning

(
Mapper

Task)

Spat.Bins

L1 File

L2 Proc. & Spat.
Binning

(
Maper

Task)

Spat.Bins

L1 File

L2 Proc. & Spat.
Binning

(Mapper Task)

Spat.Bins

L3 Temp. Binning

(Reducer Task)

TA Report

Temp.Bins

Temp.Bins

TA Formatting

(Staging)

16

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Processor integration








Adapter for Unix executables (C++, Fortran, Python, ...)


Adapter for BEAM GPF operators


Concurrent processor versions in the system


Automated deployment of processor bundles at runtime

17

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Supported by BEAM Graph Processing Framework


Access to data via reader/writer objects instead of files


Operator chaining to build processors from modules


Tile cache
and pull principle for in
-
memory processing


Hadoop

MapReduc
e for partitioning and streaming

Calvalus

+ BEAM for data streaming

18

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Quality check in bulk processing workflows

QL

gen

1 day

QL

visual

QC

black

list

autom
.

QC

inven

tory

QL

gen

SR

QL

visual

QC

black

list

GET

ASSE

ORB

ATT

error

report

GET

ASSE

feed

back

FRS/

RR

L1B

AMOR

GOS

FRG/

RRG

L1B

L2

proc.

SDR

7 day

SR

compo

L3

proc.

autom
.

QC

inven

tory

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

700 inputs with
issues identified
in MERIS L1B

Bulk production control for full mission
reprocessing

20

Processing Monitor


Request
Queue


Workflow engine


Resource management

start

bulk production

concurrent

processing steps

progress observation

parameters

sequencing

resources

constraints

report

status

years,
increasing

two months
at a time

processing
workflow

processor
versions, ...

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Jobs and tasks to be managed

21

Workflow
Step

Bulks

Jobs

Tasks

Inputs

Outputs

Input MERIS FRS+RR 2002
-
12

150 TB

auto
-
QA+inventory

1

20

210020

210000

20

QL daily

20

217300

210000

7300

QL
scenes

20

210000

210000

210000

visual QA screening inputs

7300+

AMORGOS geocoding

1

240

210000

210000

210000

Level 2 SDR processing

240

210000

210000

210000

Level 3 SR 7
-
day composites

1

1040

247440

210000

1040000

QL SR

1040

1041040

1040000

1040

visual QA screening outputs

1040+

SR

result

export

(10)

60TB

Sum

3

2620

2345800

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Calvalus

portal for on
-
demand processing

22


input set selection


processor versions


processing parameters


in
-
situ data for matchup analysis


variables for aggregation


trend analysis

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Summary


Calvalus

is a multi
-
mission
full mission data
processing system
for bulk (re)processing, data analysis and algorithm validation


Calvalus

is based on the open source middleware Apache
Hadoop

and implements
massive parallel data
-
local
processing


Calvalus

integrates processors of the BEAM GPF processing
framework and Unix executables in any programming language


Calvalus

is successfully in used by various projects and will be
further developed



Acknowledgement
: The initial
Calvalus

idea was developed and its
realisation was funded by the European Space Agency under the SME
-
LET
programme.

23

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012

Reflection points


The adequate hardware infrastructure for
Hadoop

is different from the current trend of
virtualisation and network storage (transparency
vs. knowledge of data location).


Adapted optimised solutions may have a shorter
life cycle than generic, standardised ones
(processor interfaces that support data
streaming vs. file interface)


Historical missions (ENVISAT) are not the
problem. Are we prepared for Sentinel data?

24

Models
for

scientific

exploitation

of

EO
data

* ESRIN * 12.10.2012