EVA Meeting 11/17/2009 Science and Technology Center University of New Mexico

piloturuguayanAI and Robotics

Oct 15, 2013 (3 years and 11 months ago)

82 views

EVA Meeting 11/17/2009 Science and Technology Center University of New Mexico

General Agenda

Identify Problems day 1

Discuss solutions day 2

Implementation day 3


Introductions around the table

1.
W
hat are the grand challenges in doing data intensive
science?



Joint Models

using inter
-
acting covariates that affect multiple outcomes.



Bird migration, insect
populations and emergence time. Build integrated models of all sensors. Most models are based on
investigating one outcome.

Multiple species, m
ult
iple simultaneous outcomes,
and multiple outcomes



Connecting climate change to natural resource management.

Develop decision support tools for
land managers.


Data fusion

of multiple diverse data streams and their uncertainties.

multiple protocols, pre
-

processing
issues



Logistics

-

data acquisition
,
data availability
,
system/processing time



L
ack of data architecture, and d
ata lifecycle
for supporting data intensive computing (processing
pipelines)



Paucity of user friendly tools for managing and u
sing data and metadata
.
Tools may be available,
Education and outreach

needed



Discovery
, access, understanding

of data and process tools.

Workflows to analyze and create a
data product



Dynamic community generated
m
etadata to understand data, how
collected
? Provenance,
versions, uncertai
nties.

Metadata standards
,

versioning



Developing what
-
if scenarios for voluminous

data sets




High throughput evidence
-
based decision making (data life
-
cycle)



Cross
-
generational sensor fusion



Coherent time s
eries


phenology (observations, trends, and forecasts)



Sharing of methodology and process insight





(networked)
data analysis



Paucity of some data to address an issue/problem


Integrating across disciplines

Education and outreach for data intensive
science

Understanding the stakeholders needs

so that data and tools can be used in decision making


2. Overview of DataOne

3. Overview of EVA working group goals and direction.


Identify and develop data intensive Expl
o
ration, Visualization, and

Analysis e
xemplars to serve
as the foundation for the creation of workflows …


Phenology Interoperability Project





5. What
are some of the grand challenge

data intensive science questions that you would like to help
EVA work on?



Criteria for data to highlight D
ataOne

Important problems
, or big science management, or political



Collaborative use of infrastructure data
-
> analysis
-
> visualization



Science driven, but what this group wants to be



Data from disparate data sets & multiple scales (large size)



Eu
rasian Collared Dove discussion




Project ideas:



2/9

Hydrological
-

climate, hydrological


economic
.
Modeling predictions of future
climate change,. Relationships between water and various hydrological scenarios. Start with Idaho, but
of interest to
all western states



9/11


Spatio
-
temporal change

relating species occurrence & phenology. How to
integrate a process model of dispersal. Collared
-
dove
. Bind migration. Decision support tools and
feedback




Metabolic biology

-

biotic

a
nd
abiot
ic cycles.

S
pecies abundance &
community
composition and

explain occurrences based on metabolic theory of ecology. Flux of energy through
system.




Vegetative remote sensed data

related to occurrences. Phenology of pollination,
phenology of migration, impacts of hu
man population.

Ecological mismatch theories.




Work

flo
w tool challenge
-

Make processes autom
atic
, quick com
parative data analy
sis,
streamlining process requires visualization embedded into process. A “many
-
eyes” approach to EVA



Phenology
as it relate
s to economics
. Feedback mechanism to detect changes. Feedback
to land managers/citizen science. Feedback across sensing networks and
connecting to land
managers/
citizen scientists



Interactions between pattern and process boundaries. This is related with

how to relate
observations with machine learning tools.

Process knowledge into machine learning.



0/9

Energy flux networks related to remote sensed data and how i
t relates to migration.
CO2 flux deltas

-
>

habitat

delta
s & vegetation dynamics
-

>

species

occurrence



0/3

Data mining inter
-
annual
canopy development related to ecosystem dynamics
.
Additional data streams could be web cameras, and flux towers.



R
elating control of a factor (invasive) with species occurrence changes.




Shifting dynamics of energy
usage i
n agric
ulture related to
species occurrence



Combine NDVI and Canopy data to improve the predictive power

of species models



0/1

Monitor evolution in real time
,
gene
sequencing in field



R
emote sensing oceanographic
primary
produ
ct
ivity
data
&
relate to species occurrence




Voting distilled out
3 selections

:

1. Hydrology /Climate Ecology. Can observed changes in
hydrology,
geomorphology,
disturbance( fire, insects), ecological health be attributed to climate change
? If so, can we make
accurate future predictions?



2
. Relation of either species occurrence or phenologies with a variety of predictor variables.

Predictor variables can be anything from flux tower chemistry, landcover including MODIS,
human population dens
ity, climate, or weather.

To develop visualizations, research papers,
and decision support tools.


3.


Relationship between energy fluxes, vegetation

dynamics and their impact on fire and
drought.

Once it was decided the group divided into tool selection g
roups.


3 tool selections:

1.

Workflow tool challenge

2.

Interactions between pattern/process boundaries

3.

MODIS NDVI


calibrate methodological too question



--
Tool Breakout notes


Tool Discussion Group: Kevin Webb, Suresh SanthanaVannan, and Bob Cook

Overall
Question



2. Relation of either species occurrence or phenologies with a variety of predictor variables.
Predictor variables can be anything from flux tower chemistry, landcover including MODIS,
human population density, climate, or weather. To develop visu
alizations, research papers,
and decision support tools.




Spatial Data (US)

Spatial area: conterminous US

Time range: 2000
-
2009


MODIS NDVI (250
-
m; 16
-
day)


MODIS LandCover (500
-
m; annual


MODIS Phenology (500
-
m; annual)


National Land Cover Database

(30
-
m; 2001 & 2006 (soon))


Human Population Density

(1
-
km; 2000
-
2015)


Climate / weather data

(Daymet) (1
-
km)



Temp, rain, snow, fog, hail, Vapor Pressure


Elevation (GTOPO 30
;

NED48
; SRTM (90
-
m); ASTER DEM

(
30
-
m)


Freeze (last freeze, first freeze)

(??

resolution)


Bird conservation region (polygon)


Point Data


Flux tower (



Analysis tools:

Fragmentation

tool: 2 dozen outputs used as predictor variables (%land cover, edge density,
fractal index, edge contrast, edge similarity, etc.)

750
-
m, 1500
-
m
,
15,000
-
m



Tool: How to combine data layers having different resolution and projection

(include resultant
uncertainty)
?


User selec
ted aggregation / disaggregation


User selected target area


Variety of resolutions and projections




Research Questions

1.


Disaggregation of a coarser resolution product by using a fine
-
resolution product (land
cover)



Assumption is that the coarser variable is related to the finer
-
resolution product


2.


Method to combine multiple point data in a target area


-----

End of break
out notes
---


Day 2

ORNL Resource presentation

MODIS Consumers
-

discussion of AKN
’s

MODIS data usage as a Pathfinder concept

Data Modeling and Visualization

Concrete objectives

for EVA project


Timeline



Link MODIS data with bird observations

March 2009


Predictor Variables



Daymet



Leaf Area Index



Human Census



Elevation



NLCD and derivatives



Climate



MODIS Phenology



Distance from water (Jeff)



Stream (EPA)



Wetlands (FWS)


Responses



eBird data (plus potential for more data)



Presence/Absence


Models



STEM


BDT



What are the covariates that best predict arrival?

Detailed discussion questioning the validity of the 2
nd

modeling proc
ess using predictors
that were u
sed during the creation of the model in the first phase.


Articu
late Questions


What are the key factors that affect the timing of bird migration?


Focus on species that are relevant to resource management, i.e. Willow fly catcher & Tamarisk


Are invasive species of snakes in FL affecting birds that remain in FL during

migration?


What are the characteristics of migratory stop
-
over areas?


Do birds migrate through specific corridors?



Changes in corridors may affect bird migration.

Visualizations


Advanced research exploration


General exploration

What specific
interactive tools should DataOne incorporate into the I
nvestigator’s Toolkit?


Data Extractor for x, y, target data from data layers (machine service)


Visualize (Model output and inputs) Data
-

spatial, time series scatter, etc. (Viz Trails)


Make GISIN
talk to eBird to MODIS (Register eBird in NBII clearinghouse)


Workflow tools with versioning info and comments


Web Browser enabled tools


Data Discovery


multiple factor searches


Dynamic Metadata


Interactive support for very large data sets > 10
6

obse
rvations


Contribute data and metadata to ?


Tools to view & understand spatial data


Method for interacting with CCI
T



Iterative (evolutionary)



Requirements based



Agile



Outreach/interaction after development



Day 3

1.
Review and strategy going forw
ard


Break out into specialized groups



Data organization



Modeling



Visualization workflow

2.Presentation of
Software for automatic habitat modeling

(SAHM)

based on Java and geotools

(J.
Morisette)

Data Prep

-
Resampler program to match survey area to
a common output format so that the
model can be applied to the predictors (data preparation)


CSV input,

geotiff,
R scripts, MAXENT (JAVA based)

Geotools does predictor extraction from from predictor tiffs for hand
-
off to R modules.

XML,
geotiffs, and
combined maps

output

Extents
500 m

1.0 km

2 km

3.
Overview of next steps in the process, timelines, etc.

EVA exemplar Overview

1.

Migration phenology

2.

Invasive species monitoring

Challenges

1.

Data Synthesis

2.

Modeling

3.

Workflow
and
visualization


The Products

1.

Refere
bce Data set

2.

Vis Trails workflow

3.

Migration Atlas of North America (WEB)

4.

Data intensive Science Process paper (Peer review)

5.

Invasive Species Modeling (Peer review)


Deliverables


DataOne goal workflow for P1

1.

Identify a covariate need

2.

Search
-
> found
-
>
DataOne

Not found
-
> build it


Provenance, validation …


Variables
-

merged dataset (Kevin lead)

1.

MODIS NDVI

2.

PRISM interpolated weather 2007
-
2008

a.

Possible workflow PROSM into BIOCLIM

3.

Human Census

4.

Elevation

5.

NLCD canopy , landcover & imperviousness

6.

Distance t
o water

7.

Bird Conservation regions

8.

Forest Biomass

9.

MODIS phenology

Model of Bird migration (Daniel lead)

1.

STEM predictors with NDVI

(DF)

2.

Estimate partial dependence directly

(DF)

3.

2
-
step

-

STEM probability occurrence
-
> get arrival
-
> find association with

NDVI

4.

STEM


mechanistic migration process

5.

Modeling arrival dates using ??

6.

Mechanistic white
-
box


expert

Information driven mechanistic modeling




Workflow (Patrick lead)

1.

Take JM workflow, integrate into VizTrails (Jeff, Patrick Roger Gill, Claudio,
Juliana.
Grad student)

2.

Reference data set and 6 models (DF, KW, Claudio, Juliana, Patrick, grad student,
summer student)

3.

Data ingestors and machine services into VizTrails (possible)

4.

Complete workflows for migration analysis.