Project Proposal - Filebox

perchmysteriousΔιαχείριση Δεδομένων

30 Νοε 2012 (πριν από 4 χρόνια και 8 μήνες)

262 εμφανίσεις


1



Abstract

This paper outlines our approach, called
CrimeRank, for developing a real
-
time mobile transit application
that incorporates crime data from the DC metropolitan area into
route planning. Our approach is intended to provide the user
with a variety of optio
ns to get to their destination, while
providing valuable insight into the amount of crime along their
route and their destination.


Index Terms

Data Mining, Android, Crime, WMATA,
Visualization, Spatial Data, Transit


I.

I
NTRODUCTION

The growth of mobile
devices such as Android and the
iPhone has led to an increase in demand for location aware
applications. In a single quarter, more than 81 million smart
phones were sold
[
1
]
. The Android platform accounts for
25.5% of all smart phone sales, meaning there is a high
demand for Android applications that can perform
useful
location based services.


Data for developers on both the iPhone and Android
platforms is widely available. For example, the API for the
Washington Metro Area Transit Authority (WMATA) is free
for anyone to use, and provides useful spatial and te
mporal
data for Metro riders to find their way around the DC
metropolitan area. Governments and everyday citizens have
also provided freely available crime data that is searchable for
anyone.


With widely available data and high demand for mobile
devic
es that can accurately pinpoint a user’s location, there is
an opportunity for developers to create applications that
provide interesting and insightful information for consumers.



Our proposed approach, called CrimeRank, is to use crime
data provided to

us for the DC metropolitan area, combine it
with WMATA data, and perform various spatial data mining
techniques to provide the user with valuable insight into the
amount of crime along their route and destination. With this
approach, we intend to provide

the user with a variety of
choices to get to their destination, while optimizing for
different characteristics, such as shortest path, quickest route,
least amount of crime, and so on. A simple use case for
CrimeRank would involve the user providing thei
r current
location and their desired location, and our processing
framework would calculate the amount of time it would take


for the user to arrive at their destination. Crime details and a
crime “score” for various stops along the route and their final
d
estination would be provided so the user understands if there
are high crime transfer points or if their destination is in a high
crime area.


Additionally, our system architecture will utilize cloud
-
based resources for computation and storage. The limi
tations
of mobile devices have driven mobile applications to use the
cloud for computation. For example:




Smart phone have low cost, low power CPUs that
are not capable of handling large processing loads
and have limited multitasking support.



Limited b
attery life prevents extended computation



Storage is limited to several gigabytes (Micro SD
cards are available up to 32 gigs).



Keyboards are small and cramped.



Screens are extremely small.


By moving applications to the cloud one is able to take
advantage of the vast computing resources provided by
vendors like Google, Amazon, and Microsoft.

The smart
phone will simply create a request and display the results
returned from the “Cloud”. Thus, the majority of the
applications intelligence will res
ide “in the cloud”, which will
reduce the amount of work done by the smart phone.



In this paper, we will provide an overview of the related
work in the field of transit and crime mobile applications, then
describe our proposed approach, and outline the
architecture
of our proposed system.

II.

R
ELATED
W
ORK

There are a plethora of mobile applications for consuming
mass transit data in real time. In the DC area alone, there are
over 15 Android applications serving up WMATA transit data.
All have similar them
es and revolve primarily around giving
users information about arrival times for buses and trains.
Some apps serve both systems and incorporate GPS for
quickly locating public transportation nearest the user. One
application allows for giving feedback to M
etro, similar to
other citizen reporting applications focused on problems
within a city. A few apps attempt to incorporate Twitter data,
but none work extensively with datasets not accessible
through the WMATA API.


Washington DC Mobile Transit App

John Brewer, Eric Frohnhoefer, Weihan Yang


2


Figure
1
: DC
Metro Transit App for Android

The market for mobile crime applications is far less
developed. No DC specific Android applications exist. Most
crime apps are browser based and simply display the crime
data points on a map, giving locations of incidents, the
n
allowing the user to drill down and find out more about events
in a particular area. Some applications simply allow for
retrieving regular police reports for a given precinct. Probably
the most similar in nature to the proposed CrimeRank
application is a
n iPhone app named, “AreYouSafe: DC”
reportedly due to launch soon
[
2
]
. Currently, all that they have
made available is a Google Maps based heat map of the
District with no further information (Figure 2). A web
-
based

application, “Stumble Safely”, uses crime data to find safe
walking routes home from bars in NW DC.



Figure
2
: AreYouSafe: DC heat map

As technology becomes more prevalent in everyday life, so
too does information about the worl
d in which we live.
Specifically, GIS and spatial databases allow for mining and
exploration of datasets with regard to their location. This extra
level of detail allows for more opportunity for correlation and
drawing helpful conclusions. Putting the powe
r of GIS based
applications in the hands of millions of people using mobile
applications only serves to accelerate the gathering of the data
and the rate at which it becomes useful to others.

III.

P
ROPOSED
A
PPROACH

Our proposed approach, called CrimeRank, will
provide the
user with a rating for the amount of crime along a route that
they planning to take. Using our mobile application, the user
will input what their desired destination will be. The
application will calculate a variety of routes from the user’s
current location, optimizing for a number of factors, such as
least amount of crime, shortest time, shortest path, and so on.
These routes can be a combination of Metrorail, Metrobus,
driving, and/or walking directions.


CrimeRank will be a density base
d measure that calculates
the average crime density for an area. There will be four
levels: Low, Average, Moderate, and High. An Average
crime density implies an average amount of crime, while Low
will mean one standard deviation below average. Moderate

implies one standard deviation above Average, while High
will mean two standard deviations about the average. Based
on the crime density along the various routes that our
application calculates, we will output a score for each route,
allowing the user to

choose a route with a low crime rating, or
possibly a faster route with a higher crime rating.

IV.

S
YSTEM
A
RCHITECTURE

A.

Overview


To implement our proposed approach, we plan to use a
cloud
-
based Amazon EC2 service, to host a web server and
database. This will allow us to offload computation from the
Android mobile device to a more powerful server. The web
server, which we call the
Transit Server, will host our
application, which will interact with the database to retrieve
and process the data for the user.


The crime dataset and static WMATA data will be loaded
into the database, while the real
-
time WMATA data can be
fetched by th
e web
-
server when needed. The web server can
expose a REST interface for the Android phone to use to
perform queries and retrieve results. The spatial processing
and data mining will run in the web application, combining
our crime and WMATA data and retu
rning the results to the
Android phone. Figure 3 shows the overall architecture of the
system. Figure 4 shows the architecture of the web
application that will be hosted on the web server.



Figure
3
: System Architecture


3






Figure
4
: Web Application Architecture

To keep the architecture uniform, we plan on writing a
majority of our code in Java and using free third
-
party Java
libraries to implement our system. We will also plan on
utilizing the And
roid SDK and the various Google APIs for
tools such as Google Maps. Table 1 outlines the various third
party libraries we have chosen to implement our architecture.


Table
1
: Software Components

Component

Selection

Cloud
Provider

Amazon EC2

Web Container

Tomcat

Dependency Management

Maven

Java Application Framework

Spring

Database

Postgres and PostGIS

REST library

RESTeasy

Android Development

Android SDK

B.

Choosing a Cloud Service

Cloud computing options can be broken down into three
categories, Software as a Service (SaaS), Infrastructure as a
Service (IaaS), and Platform as a Service (PaaS).


Soft
ware as a Service vendors prove

a fully hosted web
-
based
applications.
SaaS vendor

typically charge on a per user basis.
No vendors in this category were evaluated.


Platform as a Service vendors offer a development platform
for developers. Developers write their own code and deploy
their application on the vendor’s platform. The ben
efit of the
PaaS model is the developer doesn’t have to worry about the
maintenance
(i.e. patching, upgrades, etc.) of the platform
running the application.


Infrastructure as a Service vendors provide an enterprise grade
physical infrastructure in which a

developer can deploy a
virtual machine running their application. New instances can
be stopped and started as demand changes and the developer
pay an hourly rate per instance. The benefit of the IaaS model
is the developer has full control over the plat
form, but does
not need to be concerned with the physical infrastructure.


For our application we explored three cloud computing
services, Google App Engine, Microsoft Azure, and Amazon
EC2.
These vendors represent the three leaders in cloud
computing ser
vices.
All
three services

offer a

free
pricing

tier
we can
utilize to
develop, test,

and

deploy our application.

Each service was examined based on ease of setup,
runtime
familiarity,
and geospatial support.


Google App Engine follows the PaaS model and

features a
Java and Python runtime environment. Persistent storage of
data is provided
by
a non
-
relational Datastore. App Engine
wasn’t the best option for this project due to the lack of built
-
in geospatial functionality in the App Engine Datastore. A

number of projects, such as GeoModel, GISCloud, and
GeoDatastore, aim to add geospatial support. However, their
functionality is limited. Google is planning to deploy a
relational Datastore based on MySQL which does have
geospatial support.


Microsoft Azure also follows the PaaS model and features a
.NET runtime environment. Persistent storage of data is
provided by Azure SQL which is highly distributed version of
Microsoft’s SQL Server. While Azure SQL supports the same
geospatial operation
s as SQL Server we ruled out Microsoft
Azure because nobody on the team was familiar with the .NET
runtime
.


Our last option, Amazon EC2 follows the IaaS model. We
selected Amazon EC2 to host our application. By using EC2
we have greater control over th
e environment in which our
application will be

run. The added flexibility comes at the
cost of increased setup time. Because Amazon has such a
wide user base we were able to find a per
-
built Ubuntu VM to
get us started.

C.

Transit Server

The Transit Server web application will be built using
Spring to manage the application framework and RESTeasy to
provide management of the REST interface. Maven will be
used to automatically manage all third
-
party libraries. Our
application will be deplo
yed on a Tomcat server running in
Amazon EC2.


We will use Postgres as our database and PostGIS as the
spatial database extension to Postgres to allow us to store
spatial objects. The database will also be installed on our
Amazon EC2 instance and persis
ted to Amazon Elastic Block
Store. Postgres and PostGIS provide Java libraries that our
application can utilize to perform regular SQL queries and
spatial queries. Data will be retrieved from the database using
JDBC and writing SQL statements rather than

using an ORM
library such as Hibernate and Hibernate Spatial. PostGIS also
provides indexing on spatial objects to help speed up our
queries.


The REST interface on the server
-
side will return JSON
objects to the Android device. By performing simple PO
STs
or PUTS of location data (for example, where the user
currently is and where he or she wants to go), the Android
device will be able to retrieve JSON objects detailing route

4

information and crime information. Essentially, when the user
accesses a REST

service on the web server, the web server
will open a connection to the database and perform the
necessary queries to retrieve the desired data. If any real
-
time
WMATA data is needed, the web
-
server will then use the
WMATA API to retrieve the data. The
real
-
time WMATA
data and the stored crime data will be merged in a post
-
processing step and then returned in a JSON format to the
user. This sequence is shown in Figure 5.


A simple use case that demonstrates this sequence has
already been implemented a
nd successfully tested. The
mobile device is able to connect to and retrieve JSON data
from the Transit Server.




Figure
5
: User query sequence diagram

D.

Datasets and Data Ingest

The crime data set contained a number of issues s
o the data
had to be “cleaned” before it could be imported. We removed
unrecognized characters and removed entries with invalid
locations. Only data for the most current year (2009) was
imported.


Using the
pg_read_file
() function in PostgreSQL the crim
e
dataset was directly loaded into the database as a XML data
type. This allowed us to use XPATH to parse the XML and
load the data into a table.


We decided to cache the bus and metro stop locations because
the dataset is relatively static. Doing so al
lows us to pre
-
compute our crime metric and store the results in the database.
The locations of bus and metro stops were obtained thru the
WMATA API. The XML loaded into the database using the
pg
_read_file
() function and XPATH was used to load the data
i
nto a table.


TIGER/Line shape files are provided by the US Census
Bureau. These files contain select geographic information
such as roads, water features, and political boundaries. Using
a tool call org2org we imported the boundaries for all counties
in which crime i
nformation was available. This allowed us to
prune crime data containing invalid location information. The
TIGER/Line data may also help in computing our CrimeRank
metric.

E.

Android

Client

The target handset for this project is a Verizon Droid
(Motorola A8
55). This phone runs Android v2.2.2 also known
as Froyo. This corresponds to Android API Level 8. The
application will aim to support older versions of Android, but
the focus will be to ensure that everything runs efficiently on
Froyo.


Android runs Java a
pplications on a Java based framework.

A
basic level of working knowledge
of the Android stack
was
necessary to begin exploring the possibilities within the
operating system.

To this end, several tutorials using the
Android SDK were completed, with each ai
med at developing
paths to provide the necessary functionality for the CrimeRank
application.


The Android SDK was installed along with the Google Maps
API Add
-
on.
A baseline application demonstrating the rough
pieces of the final application was created
.
This application
made initial contact with the web server to do REST
operations (get, put, post). This application was tested on both
an emulator and the Droid. All the necessary tools for
application development were tested as part of this initial
survey:

debugging and logging capabilities from within
Eclipse, along with emulator setup and loading code onto the
device.


Still remaining in the realm of basic functionality to be
explored is an on
-
board SQLite database. This capability will
most likely be us
ed to store user favorites and their
CrimeRanks as well as other pieces of information needed for
quick access.


The development of the Android application will be
incremental in nature. The goal is to grow the application by
working on various pieces of f
unctionality in steps. At the end
of each step, the application will be useable and capable of
demonstrating the added functionality. The application will
also undergo testing at the end of each increment. By doing
incremental releases with testing, we hop
e to keep the project
on track for delivery and reduce the amount of overall testing
near that time. Rather than attempting to add more developers
if we fall behind schedule, the plan will be to drop
functionality.


The goal of the Android application deve
lopment is to provide
an interface for the user to interact with the CrimeRank
algorithm in a meaningful way. The pages of the application
should be designed so that the user experience is both simple
and efficient. Basic WMATA functionality will mirror
ex
isting Metro applications to meet the expectations of current
users of such applications. The CrimeRank algorithm will be
on display throughout the application and uniquely integrated
with the WMATA dataset.



5

The first and most basic piece of functionality

to be delivered
is a list of WMATA locations nearest the user. The GPS on
board the phone will provide the desired location. A simple
nearest neighbor query on the Metro database should return
the information. This will be displayed as a scrollable list
d
isplaying the distances to each stop. The user will be able to
pull up a Google Map window with all the points displayed.


The next piece of functionality will be to give the user an
immediate assessment of his or her current location’s crime
level. The sa
me GPS based location will be used to query the
web server and custom CrimeRank score will be returned and
displayed. Along with the custom score, the list of close metro
stops will now have crime scores attached as well. This
information will have been pr
eviously calculated and the
number will be provided to the user.


A route planner will also be developed. The first step will be
to use the WMATA API to return possible itineraries. Again,
these will be displayed as a scrollable list. Then the
CrimeRank wi
ll be added to the itineraries, giving the user the
ability to make tradeoffs with regard to safety and/or
expediency. An effort will be made to display a chosen
itinerary in Google Maps. It has yet to be determined whether
visual displays of CrimeRank wil
l be made available to the
user. This is a piece of functionality that is set to be added as
the schedule allows.


In parallel with the development of those pieces of
functionality, the look and feel of the app will be determined.
Sketches of each page of
the application will be made and
evaluated for usability. These sketches will then be translated
into XML layouts for use in Android. The flow between
pages, user menus, and application settings will all have
associated graphical user interfaces that need
to be designed.


V.

P
ERFORMANCE AND
E
VALUATION

A.

Performance

We will add metrics gathering to be an integral part of our
application so we can measure the speed and responsiveness
of our queries. Any mobile or web application needs to
respond to the user in a
reasonable amount of time. After the
implementation is complete, we will examine the speed of our
queries to see if the spatial queries, as we’ve implemented
them, are sufficiently quick for our application.


Postgres and PostGIS provide indexing capabilities for
spatial objects that we can use to optimize and speed up our
queries. We will add spatial indexes where necessary based
on an analysis of the types of queries we typically perform.

B.

User Experience

A
nother aspect of this project that will need to be evaluated
is the design of the user interface. We will have to examine
the look and feel of the interface on the Android device to
determine if it is intuitive and easy to use.

VI.

S
CHEDULE

Table 2 outlines

the schedule our team will follow.


Table
2
: Project Schedule

Date

Deliverable

2/21

Project Proposal

4/4

Project Checkpoint

5/2

Project Presentation

5/9

Project Report


VII.

P
ROGRESS
R
EPORTS

John Brewer

Task

Comments

Status

Setup Development
Environment.

Eclipse, Android
SDK, Google
Maps API, etc. are
all configured.
Debugger and
logger are
working.

COMPLETED

Make REST
connection.

Basic get, put, post
operations tested
on emulator and
handset.

COMPLETED

Design Application
Page Layout/Flow

Sketches and flow
diagrams will be
developed to guide
the process. These
will later be
translated to XML.

IN WORK

Closest Metro Stops

A simple
interaction with
the WMATA API
to get started.

IN WORK

Incorporate
CrimeRank with
GPS and Metr
o.

Return crime
scores for current
location as well as
desired metro
stops.

IN WORK

Trip Planner

Return list of
itineraries along
with CrimeRank
score to allow for
tradeoffs.

IN WORK

Google Maps
Visuals

Give the user
visuals of
immediate vicinity
and/or
chosen
routes.

IN WORK


6

Testing

Testing of each
new piece of
functionality,
testing of the user
interface, etc. will
be occurring
throughout the
development
process.

ONGOING



Eric Frohnhoefer

Task

Comments

Status

Evaluated various
cloud vendors for our

application.

Amazon EC2
selected.

COMPLETED

Deploy Ubuntu
virtual machine on
Amazon EC2.

Instance stored on
S3. DB data files
and Tomcat stored
on EBS for
persistence.

COMPLETED

Install Tomcat,
PostgreSQL, and
PostGIS. Provide
access to group
members.

Need to work with
groups to
determine queries
so that indexes can
be added.

COMPLETED

Ingest crime data set,
static metro data, and
TIGER/Line Data.

Only 2009 data
used. Alexandria
also not imported
due to lack of valid
dates.

COMPLETED

Work with group

to
design and evaluate
CrimeRank
algorithm.


IN WORK



Weihan Yang

Task

Comments

Status

Select libraries to be
used in the web
application


COMPLETED

Design framework for
the web application

Spring application
framework
implemented,
tested, and
deployed to the
EC2 instance.

COMPLETED

Design and
implement an HTTP
interface for the
Android phone to use

Need to work with
John to figure out
what queries the
phone will be
performing.

IN WORK

Im
plement spatial
queries to allow the
CrimeRank algorithm
and Android device
to access the needed
Implemented a
simple query to
find all crimes
within some
distance of each
IN WORK

data

Metro station.
Need to decide
what other queries
are necessary.

Integ
rate real
-
time
WMATA data into
the queries

Have to access the
WMATA API and
combine with the
results from the
database queries
before returning
the results to the
mobile device.

IN WORK


R
EFERENCES


1.

Gartner Says Worldwide Mobile
Phone Sales Grew 35 Percent in
Third Quarter 2010; Smartphone Sales Increased 96 Percent
.
2010 2/18/2011]; Available from:
http://www.gartner.com/it/page.jsp?id=1466313
.

2.

Are You Safe?

2/2
0/2011]; Available from:
http://areyousafedc.com/
.