MRI: Development of an

difficultmangledΚινητά – Ασύρματες Τεχνολογίες

12 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

108 εμφανίσεις

MRI: Development of an
Integrated, Geospatial Analytics
Research Instrument


Sept 2011
-
August 2014


PI: Naphtali
Rishe


Co
-
PIs:
Shu
-
Ching

Chen,
Vagelis

Hristidis
, Tao Li, Ming
Zhao


Sr. Investigators: Steven Luis, Scott Graham

Instrument Development
Goals

Easily
, accurately, and efficiently perform queries of, and
analytics on, massive amounts of heterogeneous geo
-
spatial data from disparate sources, all in a single, user
-
centric environment


Integration of
TerraFly

and BCIN


Integration of detailed Florida datasets


Pilot use for disaster management applications in
Florida


Research to be Enabled


Geospatial information and knowledge
d
iscovery


Algorithms in data
q
uality
c
ontrol and indexing on
heterogeneous, multi
-
source disaster
-
related streaming
d
ata


Automated and/or semi
-
automated discovery
approaches


data relationships


Intelligent query/search model


High
-
performance real
-
time virtualization architecture
and autonomic management


Geospatial data indexing and dissemination

Disaster
Dataspace

Services Cloud

Key Challenges


Chen, Li
, & Luis

Deliver the right information to the right person at the right
time in the right format




Present only high quality search results




Deliver the results in constrained operation conditions:



Reduced bandwidth



Mobile/text interface



Crisis situation (Human Factors)




Reconcile conflicting reports, inaccuracies, false information




Integrate data from tens of thousands of real
-
time sources, in
both text and non
-
text formats, which vary in metadata richness


Key Partners and Outreach


Government:
Miami
-
Dade County Emergency Management &
Business Recovery Committee
, Palm Beach County Emergency
Management & Private
-
Public Partnership,

Broward County
Emergency Management, Monroe
County Emergency
Management


Financial:
FloridaFIRST


Professional Assn:
Greater Miami Chamber of Commence


Retailers, Logistics, Manufacturing, Services:
IBM, Office Depot,
Wal
-
Mart, Ryder, Greyhound, Beckman Coulter, NCCI, and many
others…


Healthcare
:

South Florida Public Health Institute, Quantum Group


Disaster Vertical Search
Engine


System components


A crawler


Multi sources


Heterogeneous formats


A storage system


Database


Indices


A search engine


Vertical search


Crawler
:

1.
Wrapped scripts to efficiently crawl data
from heterogeneous sources.

2.
Unknown network structures which
prevent from advanced analysis and
ranking algorithm design.

3.
Using
Nutch

API to extract information
based on our specific needs.

4.
Need for the storage of extracted
information.

Indexer
:

1.
Lucene

is used for full text indexing.

2.
Supporting for various data format.

Crawler
-

Nutch

Indexer
-

Lucene

Text, Image, Video

Vertical Search


Our target

Keyword search

Vertical Search/A vertical search engine
(http://en.wikipedia.org/wiki/Vertical_search):

1.
Greater precision due to limited scope.

2.
Leverage domain knowledge including
taxonomies

and
ontologies

3.
Support specific unique user tasks

Web Sites

Websites Processor

Filtering

Storage

Actionable
information

ontology

News
Ranking

Situation
Assessment

Vetted
Sites

1

2

3

4

9

7

8

5

6

10

1. Users/experts specified vetted
websites.

2. Filtering those information for our
specific usage.

3. Building site ontology

4. Storage of actionable information.

5. Ranking algorithm design

6. Ranking feedback

7. Situation assessment design

8. Situation assessment feedback

9. Actionable information feedback

10. Pre
-
processed site seeds based
on actionable information.

UI

Vertical Search Engine
Overview

Search Results

System Integrated View

Information Clusters


State Level


City Level
-
FL


City Level
-

NY

Crawler Architecture

Disaster Crawling Database


A Global Repository of Disaster Information

Temporal
-
spatial integration for disaster
news, situation reports


Integration based “When”, “Where”, “What happened”:


Disaster Name, Time, Location


For each report, we extract the
sentences

for a specific
disaster at a time and geographic location.


Gather sentences extracted and generate summarized
sentences for a time and a location.

GeoVista


(developed By Penn State
Univ
)

Relevant
Tweets

Input query

Geo
-
location for each
tweets

http://www.geovista.psu.edu/

ABC news for Hurricane Irene

Disaster Name

Time

Geo
-
locations

Geo
-
locations

Technique Challenges


Given a text, how to know the disaster name?


“Irene” is a girl or a hurricane?


How to extract the time information?


A news written in Aug. 25 2011 said,“…Irene touched
North Carolina in the last week…” => (Irene, Aug. 25
2011, North Carolina)?


How to extract the location information?


A news written as “…in South Florida…”, then where is
“South Florida”? It is not a country, a state, or a city, how
does the computer know this place?


Initial Ideas


Given a text, how to know the disaster name?


Simple Bayes method: two continuous words w1,w2,
P(w2=<
DisasterName
>|
w1=“Hurricane”)
is high, then we
know w2 is probably a disaster name.


How to extract the time information and geo
-
location?


Rule
-
based information extraction.


Rule based IE


Time information:


for example: “Aug. 27 2011”, we can define a regular
expression: (
Jan|Feb
|…|Dec|)
\
.
\
s(
\
d){2}
\
s(
\
d){2
-
4}


Geo information:


for example: “South Florida”, we can define a regular
expression: (
North|South
|…)?
\
s<
GeoName
>


All geo names can be found from USGS
-

U.S. Geological
Survey.

Abstract Disaster Ontology

Destruction

Debris

Floods

Fires

Death

Animal
death

Out of
service



disaster

fires

floods

Hurricane
s

Influenza
pandemic

Radiological

Terrorism

Thunderstorms
and lighting

Tornadoe
s

Wildfires

Entity

Service

Bus

Highway

Airport

School

Buildings

Residential
buildings

Commercial
buildings

Bridge

Road

People

Animal

Tree

Initial Seeds Selection

1.
FL

2.
NY

3.
LA

4.
CA

5.
OK

6.
ND


Related Funding



DHS, “
A Research and Educational Framework to Advance Disaster Information Management
in Computer Science PhD Programs”



Eugenio
Pino

and Family Global Entrepreneurship Center, “All
-
Hazard Disaster Situation
Browser (ADSB) on Mobile Devices”



DHS, "Information Delivery and Knowledge Discovery for Hurricane Disaster Management”



NSF CREST, "CREST: Center for Innovative Information Systems Engineering”



NSF, “CAREER
: A Collaborative Adaptive Data Sharing
Platform”



Purdue University/DHS VACCINE
CoE
,

“A
Data Mining Framework for Enhancing
Emergency Response Situation Reports with Multi
-
Agency Multi
-
Party Multimedia
Data”



IBM Faculty Shared University Research Award

23