MRI: Development of an
Integrated, Geospatial Analytics
Research Instrument
Sept 2011
-
August 2014
PI: Naphtali
Rishe
Co
-
PIs:
Shu
-
Ching
Chen,
Vagelis
Hristidis
, Tao Li, Ming
Zhao
Sr. Investigators: Steven Luis, Scott Graham
Instrument Development
Goals
Easily
, accurately, and efficiently perform queries of, and
analytics on, massive amounts of heterogeneous geo
-
spatial data from disparate sources, all in a single, user
-
centric environment
Integration of
TerraFly
and BCIN
Integration of detailed Florida datasets
Pilot use for disaster management applications in
Florida
Research to be Enabled
Geospatial information and knowledge
d
iscovery
Algorithms in data
q
uality
c
ontrol and indexing on
heterogeneous, multi
-
source disaster
-
related streaming
d
ata
Automated and/or semi
-
automated discovery
approaches
–
data relationships
Intelligent query/search model
High
-
performance real
-
time virtualization architecture
and autonomic management
Geospatial data indexing and dissemination
Disaster
Dataspace
Services Cloud
Key Challenges
–
Chen, Li
, & Luis
Deliver the right information to the right person at the right
time in the right format
•
Present only high quality search results
•
Deliver the results in constrained operation conditions:
•
Reduced bandwidth
•
Mobile/text interface
•
Crisis situation (Human Factors)
•
Reconcile conflicting reports, inaccuracies, false information
•
Integrate data from tens of thousands of real
-
time sources, in
both text and non
-
text formats, which vary in metadata richness
Key Partners and Outreach
•
Government:
Miami
-
Dade County Emergency Management &
Business Recovery Committee
, Palm Beach County Emergency
Management & Private
-
Public Partnership,
Broward County
Emergency Management, Monroe
County Emergency
Management
•
Financial:
FloridaFIRST
•
Professional Assn:
Greater Miami Chamber of Commence
•
Retailers, Logistics, Manufacturing, Services:
IBM, Office Depot,
Wal
-
Mart, Ryder, Greyhound, Beckman Coulter, NCCI, and many
others…
•
Healthcare
:
South Florida Public Health Institute, Quantum Group
Disaster Vertical Search
Engine
System components
A crawler
Multi sources
Heterogeneous formats
A storage system
Database
Indices
A search engine
Vertical search
Crawler
:
1.
Wrapped scripts to efficiently crawl data
from heterogeneous sources.
2.
Unknown network structures which
prevent from advanced analysis and
ranking algorithm design.
3.
Using
Nutch
API to extract information
based on our specific needs.
4.
Need for the storage of extracted
information.
Indexer
:
1.
Lucene
is used for full text indexing.
2.
Supporting for various data format.
Crawler
-
Nutch
Indexer
-
Lucene
Text, Image, Video
Vertical Search
–
Our target
Keyword search
Vertical Search/A vertical search engine
(http://en.wikipedia.org/wiki/Vertical_search):
1.
Greater precision due to limited scope.
2.
Leverage domain knowledge including
taxonomies
and
ontologies
3.
Support specific unique user tasks
Web Sites
Websites Processor
Filtering
Storage
Actionable
information
ontology
News
Ranking
Situation
Assessment
Vetted
Sites
1
2
3
4
9
7
8
5
6
10
1. Users/experts specified vetted
websites.
2. Filtering those information for our
specific usage.
3. Building site ontology
4. Storage of actionable information.
5. Ranking algorithm design
6. Ranking feedback
7. Situation assessment design
8. Situation assessment feedback
9. Actionable information feedback
10. Pre
-
processed site seeds based
on actionable information.
UI
Vertical Search Engine
Overview
Search Results
System Integrated View
Information Clusters
–
State Level
City Level
-
FL
City Level
-
NY
Crawler Architecture
Disaster Crawling Database
A Global Repository of Disaster Information
Temporal
-
spatial integration for disaster
news, situation reports
Integration based “When”, “Where”, “What happened”:
Disaster Name, Time, Location
For each report, we extract the
sentences
for a specific
disaster at a time and geographic location.
Gather sentences extracted and generate summarized
sentences for a time and a location.
GeoVista
(developed By Penn State
Univ
)
Relevant
Tweets
Input query
Geo
-
location for each
tweets
http://www.geovista.psu.edu/
ABC news for Hurricane Irene
Disaster Name
Time
Geo
-
locations
Geo
-
locations
Technique Challenges
Given a text, how to know the disaster name?
“Irene” is a girl or a hurricane?
How to extract the time information?
A news written in Aug. 25 2011 said,“…Irene touched
North Carolina in the last week…” => (Irene, Aug. 25
2011, North Carolina)?
How to extract the location information?
A news written as “…in South Florida…”, then where is
“South Florida”? It is not a country, a state, or a city, how
does the computer know this place?
Initial Ideas
Given a text, how to know the disaster name?
Simple Bayes method: two continuous words w1,w2,
P(w2=<
DisasterName
>|
w1=“Hurricane”)
is high, then we
know w2 is probably a disaster name.
How to extract the time information and geo
-
location?
Rule
-
based information extraction.
Rule based IE
Time information:
for example: “Aug. 27 2011”, we can define a regular
expression: (
Jan|Feb
|…|Dec|)
\
.
\
s(
\
d){2}
\
s(
\
d){2
-
4}
Geo information:
for example: “South Florida”, we can define a regular
expression: (
North|South
|…)?
\
s<
GeoName
>
All geo names can be found from USGS
-
U.S. Geological
Survey.
Abstract Disaster Ontology
Destruction
Debris
Floods
Fires
Death
Animal
death
Out of
service
…
disaster
fires
floods
Hurricane
s
Influenza
pandemic
Radiological
Terrorism
Thunderstorms
and lighting
Tornadoe
s
Wildfires
Entity
Service
Bus
Highway
Airport
School
Buildings
Residential
buildings
Commercial
buildings
Bridge
Road
People
Animal
Tree
Initial Seeds Selection
1.
FL
2.
NY
3.
LA
4.
CA
5.
OK
6.
ND
Related Funding
•
DHS, “
A Research and Educational Framework to Advance Disaster Information Management
in Computer Science PhD Programs”
•
Eugenio
Pino
and Family Global Entrepreneurship Center, “All
-
Hazard Disaster Situation
Browser (ADSB) on Mobile Devices”
•
DHS, "Information Delivery and Knowledge Discovery for Hurricane Disaster Management”
•
NSF CREST, "CREST: Center for Innovative Information Systems Engineering”
•
NSF, “CAREER
: A Collaborative Adaptive Data Sharing
Platform”
•
Purdue University/DHS VACCINE
CoE
,
“A
Data Mining Framework for Enhancing
Emergency Response Situation Reports with Multi
-
Agency Multi
-
Party Multimedia
Data”
•
IBM Faculty Shared University Research Award
23
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment