D.C. on the Map

brawnywinderΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 4 χρόνια και 19 μέρες)

104 εμφανίσεις




Page
1




D.C. on the Map

2010





D.C. on the Map

SI 601
Mini Project Write Up




Huang Huang

Gin Corden

Xiaoxi Zhang





Page
2




D.C. on the Map

2010

1 Introduction

1.1 Purpose of Project

The purpose of our project is to analyze data from the city of Washington, DC using
ManyEyes

and
Google Maps
. We acquired some datasets about businesses and crime fro
m the city
web site

and used
Perl scripts to clean and manipulate it.

(1) We

create
d a Google M
ap of Washington DC showing
the
locations of

a random subset of
licensed businesses classified as Housing
-
Residential, meaning primarily apartment
-
style housing
. In
addition,
we cleaned data from another report titled

“Crime I
ncidents 2009”

and showed the locations
of homicides on the same Google Ma
p. With this map, we can do an analysis on a potential relationship
between the locations of these businesses and crime incidents.


(2)

We

plot
ted

a scatter plot

on Many Eyes, showing the correlations between basic business
license fees and amounts of companies, with each data point corre
sponding to a license category. We
also created a pie chart of the number of licensed companies per license category and a bar c
hart of the
average fees per license categories. These allow for an idea of the quantity and type of licensed
businesses in DC, as well as estimations of total fees collected by the city government.

1.2 Introduction of Data Sources
(from http://data.octo.d
c.gov/)

1.2.1
Basic Business License Categories and Fees

The dataset of
Basic Business License Categories and Fees
(bbl
-
categories)

provide
s

a list of the
business license categories in the District of Columbia and their associated fees. The Department of
Consumer & Regulatory Affairs (DCRA) licenses businesses in more than 150 categories. The license fees
do not include any fees for i
ncorporating or registering any

trade name with the District of Columbia.
The fees also do not include certificate of occupa
ncy fees. The mission of the Department of Consumer
and Regulatory Affairs is to protect the health, safety, economic interests, and quality of life of residents,
businesses, and visitors in the District of Columbia by issuing licenses and permits, conduct
ing
inspections, enforcing building, housing, and safety codes, regulating land use and development, and
providing consumer education and advocacy services.

1.2.2
Basic Business Licenses

In order to operate legally in District of Columbia, most businesses
must get a Basic Business Licens
e
(BBL) from DCRA
. The Basic Business License (BBL) Program streamlines District of Columbia business
licensing procedures. The BBL groups licenses by the type of business activity and regulatory approvals
required.

The data
set shows issued business licenses in the following categories: all housing licenses, motor
vehicle sales, service and repair, and home improvement contractors.

1.2.3
Crime Incidents

All statistics presented in this dataset (name: crime)

are based on
preli
minary

DC Code Index crime
data. Data reflects crimes reported at least two business days before today's date. All statistics are



Page
3




D.C. on the Map

2010

subject to change due to a variety of reasons, such as a change in classification, the determination that
certain offense repo
rts were unfounded, or late reporting.

In addition, the preliminary data do not represent official statistics submitted to the FBI under the
Uniform Crime Reporting program (UCR). All preliminary offenses are coded based on DC Code and not
the FBI offense

classifications. Any comparisons between DC Code preliminary data and the official
crime statistics published by the FBI under the UCR Program are inaccurate and misleading because of
expected differences between local and national offense classifications
.
The MPDC does not guarantee
(either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the
information. The MPDC will not be responsible for any error or omission, or for the use of, or the results
obtained from the us
e of this information.

All data visualizations on maps should be considered
approximate, and attempts to derive specific addresses are strictly prohibited. The reports provided by
this application include only events that can be mapped. This limitation exc
ludes

approximately 3
percent of the data.

1.3 Introduction of Google Map API


Google Maps is a web technology that provides free mapping and directions to users, along with a
lot of other geographically based information like transit. It is based on Java
Script and XML, and is very
customizable.

The Google Maps API is a tool allowing third party users to create their own map and
embed it in their web site. Essentially you get Google Maps functionality on your own site, with your
own data points and other c
ustomizations. There are also APIs specifically for Flash, one designed to
help developers manage geodata, one for the generation of static maps, and one that aids in the
creation of mini
-
applications that run in a map.

2 Data Manipulation

2.1 Businesses a
nd Crime Incidents on Google Map

2.2.1 Purpose

Here we wanted to display some business and crime data together, to look for patterns or
relationships.

2.2.2 Visualization and Analysis





Page
4




D.C. on the Map

2010


While the blue housing markers are fairly evenly distributed (the big blank area near the top is Rock

Creek Park; the blank area in the middle is the heart of downtown), the red homicide markers are
concentrated in the eastern half of the city. We added custom info windows to our markers, and a
review of those for blue markers clustered near red markers s
hows some propensity towards one
-
family
rental units and smaller apartment houses.


2.2 Correlation of BBL and BBL Categories

2.2.1 Purpose

In this part,

we

cleaned data in forms of bbl and bbl
-
categories. Then we

plot
ted

scatter plots on
Many Eyes, showing the correlations between basic business license fees and amounts of companies,
with each data point corre
sponding to a license category.

2.2.2 Visualization and Analysis


(1) Scatter Plot of businesses amount and Licens
e Fee




Page
5




D.C. on the Map

2010


(URL:
http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/scatterplot
-
correlation/comments/bf0a42942
1a311df9e24000255111976
)

This visualization shows correlation between business license application fee and amount of
companies in the according category. We implemented two steps to achieve this visualization. First, we
eliminated companies which do not ha
ve clear BBL category information. Second, we calculated average
license fees of BBL categories.

All of the licensed businesses we got are under four categories:
Housing: Transient
;
Housing:
Residential
;
General Service and Repairs
;
Motor Vehicle Sales, S
ervice and Repair
. From the above
image we can conclude that there is no direct or linear relationship between amount of companies and
license fees.


(2) Pie Chart of Amount of Companies




Page
6




D.C. on the Map

2010


(
URL:http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/pie_chart_amount_of_coman
ies/comments/4ac1c91221a211df9f90000255111976
)

This image shows distribution of com
panies:
Housing: Transient:
54.8%;
Housing
:
Residential
: 5.4%;
General Service and Repairs
: 34.6%;
Motor Vehicle Sales, Service and Repair
: 5.4%.


(3) Bar Chart of License Fee


(
URL:http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/bar_chart_license_fee/comm
ents/2a3174a821a311dfa94f000255111976
)




Page
7




D.C. on the Map

2010

This image shows average license fees for each category:
Housi
ng: Transient:
$180.86;
Housing
:
Residential
: $213.25;
General Service and Repairs
: $244.2;
Motor Vehicle Sales, Service and Repair
: $346.

3 Scripts

3.1 Scripts for Google Map

Flow chart of process:


URL for zip file of scripts:

http://gcorden.people.si.umich.edu/DC_map_project.zip

3.2 Scripts for Correlation

# #############################################################

# clean

data in form bbl
-
categories

# drop

repeated categories,

list out all unique category names

# push category names in a hash




Page
8




D.C. on the Map

2010

open(IN,"bbl_categories.txt")||die "couldn't open file
\
n";

$line = <IN>;


my %seen = ();

# loop through each remaining line


while($line = <IN>) {



chomp $line;


# sp
lit the line in

terms of state, income tabs


($area,$categoryname,$licensefee,$applicationfee,$endosmentfee,$unit,$total)=
split(/
\
t/,$line);



$area =~ s/
\
"//g;



$seen{$area}++;



next if $seen{$area} > 1;




push(@ctgarray,$area);


}


#print join("
\
n",@
ctgarray);


#print OUT join("
\
n",@ctgarray);

#close the file

close(IN);


##############################################################

# clean data in form bbl

# Modify category names, make sure they match names in form bbl
-
categoires

# push all category
names in a hash


open(IN,"bbl.txt")||die "couldn't open file
\
n";

$line = <IN>;




Page
9




D.C. on the Map

2010

while($line = <IN>) {


chomp $line;


# split the contents on each tab


($corporate,$trade,$address1,$address2,$num,$date,$category,$activity,$status,$status
date) = split(/
\
t/,$l
ine);


$category =~ s/
\
"//g;


if ($category eq "General Services and Repair") {



$category = "General Service and Repairs";


}


if ($category eq "Motor Vehicle Sales, Service, and Repair") {



$category = "Motor Vehicle Sales, Service and Repair";


}


pus
h(@categoryarray,$category);

}


#print join("
\
n",@categoryarray);

close(IN);


################################################################

# clean field of "total" data in form bbl
-
categories

# drop "total" with fo
rmat of character, keep only num
er
ic
al

data

# get all total data in each category, ca
l
culate average of total license fee


use List::Util qw(sum);

open(IN,"bbl_categories.txt")||die "couldn't open file
\
n";


# read in the first line and skip it;

$line = <IN>;




Page
10




D.C. on the Map

2010

# loop through each remaining line



while($line = <IN>) {



chomp $line;


($area,$categoryname,$licensefee,$applicationfee,$endosmentfee,$unit,$total)=
split(/
\
t/,$line);



$total=~ s/
\
"//g;



$total=~ s/
\
$//g;



next if($total eq "");



if(($area eq "Housing: Transient")&&($total !~ m/[A
-
Za
-
z]+/g)){



push(@feearray1,$total);}



if(($area eq "Housing: Residential")&&($total !~ m/[A
-
Za
-
z]+/g)){



push(@feearray2,$total);}



if(($area eq "General Service and Repairs")&&($total !~ m/[A
-
Za
-
z]+/g)){



push(@feearray3,$total);}



if(($area eq "
\
"Motor Vehicle Sales, Service and Repair
\
"")&&($total !~ m/[A
-
Za
-
z]+/g)){



push(@feearray4,$total);}





}






$average1 = sum(@feearray1)/@feearray1;



$avghash{"Housing: Transient"} = $average1;



$average2 = sum(@feearray2)/@feearray2;



$avghash{"Ho
using: Residential"} = $average2;



$average3 = sum(@feearray3)/@feearray3;



$avghash{"General Service and Repairs"} = $average3;




Page
11




D.C. on the Map

2010



$average4 = sum(@feearray4)/@feearray4;



$avghash{"Motor Vehicle Sales, Service and Repair"} = $average4;



#close the
file

close(IN);


########################################################################

# print out unique category names, amount of companies in each category and average
license fee


open (OUT, "> output.txt");

print OUT ("Category Name
\
tAmount of Comp
anies
\
tLicense Fee
\
n");

for $cgt(@ctgarray){



$numhash{$cgt}=0;


for $cgtref(@categoryarray){



if($cgt eq $cgtref){



#print OUT "$cgt,$cgtref
\
n";




$numhash{$cgt}++;




#print OUT "$numhash{$cgt}
\
n";



}


}


print "$cgt
\
t$numhash{$cgt}
\
t$avghash{$cgt}
\
n";


print OUT ("$cgt
\
t$numhash{$cgt}
\
t$avghash{$cgt}
\
n");

}

close(OUT);