Chicago Crime in R

Theft in Chicago

Here I will show how to set it up RStudio to analyse certain crime statistics in the city of Chicago.

1 ) We start by running a set up script,

source("~/Desktop/Chicago_crime1/crime_preamble")

which contains the following code:

install.packages("sp")
install.packages("raster")
install.packages("maptools")
install.packages("mapdata")
install.packages("mapproj")
install.packages("ggmap")
install.packages("DeducerSpatial")
install.packages("rworldmap")
require(maps)
require(ggmap)
library(raster)
library(sp)
library(maptools)
gpclibPermit()
library(maptools)
library(mapdata)
library(ggmap)
library(geosphere)
library(maps)
library(ggplot2)


The code will complain with an error that polygon geometry computations in maptools depend on gpclib, which has a restricted licence. It is disabled by default;to enable gpclib, type gpclibPermit().

> gpclibPermit()
[1] FALSE

We will now show how to obtain a map of a city and plot a handful of points that represent some criminal event. In this example, this will be crime in the city of Chicago, and in particular THEFT. As with any data one needs to really understand what the data contains, how it was obtained, and therefore be aware of any limitations or biases that may result from incorrectly analysing said data.

In this data set, a potential deal-breaker is the fact that ‘apparently’ UCPD does not have to, or does not report some crimes to CPD, where this data is taken from. Hmmm! In the following example we will ignore the fact that there are other crimes, and indeed, that the classifier THEFT has many meanings. How many? The data comes from the City of Chicago Data Portal. Clicking on the About tab (maroon, info tab) and scrolling to the bottom gives us a link to the list of Chicago Police Department – Illinois Uniform Crime Reporting (IUCR) codes. It is here that we can see there are 9 sub-types of THEFT.

Even before we load the data, we can ask for the map of the City of Chicago and display it with

ChicagoMap = qmap("Chicago", zoom = 12, color = "bw", legend = "topleft")
ChicagoMap
ChicagoMap = qmap("Chicago", zoom = 14, color = "color", legend = "topleft")

The last line of code gives a color map at higher zoom. The result is the picture of downtown Chicago.

ChicagoMap_hres1

Yet another way of displaying a map is using the ggmap function. If we define a set of coordinates, in this case longitude and latitude, then we can center the map around the mean of each array


lon <- c(-87.647248118, -87.647938205, -87.662544385, -87.751958893,-87.720089827, -87.756977729, -87.673019415, -87.740255128, -87.701586039) lat <- c(41.954492668, 41.774133601, 41.768086308, 41.879527422, 41.855245158, 41.89515102, 41.997922563, 41.881212919, 41.946538954) df <- as.data.frame(cbind(lon,lat)) > df
lon lat
1 -87.64725 41.95449
2 -87.64794 41.77413
3 -87.66254 41.76809
4 -87.75196 41.87953
5 -87.72009 41.85525
6 -87.75698 41.89515
7 -87.67302 41.99792
8 -87.74026 41.88121
9 -87.70159 41.94654

map_of_Chicago <- get_map(location = c(lon = mean(df$lon), lat = mean(df$lat)), zoom = 12, maptype = "roadmap", scale = 2) ggmap(map_of_Chicago)

ChicagoMaphres_2

All that remains to be done is to add the points we showed above to the map with


ggmap(map_of_Chicago) + geom_point(data = df, aes(x = lon, y = lat, fill = "red", alpha = 0.8), size = 5, shape = 21) + guides(fill=FALSE, alpha=FALSE, size=FALSE)

ChicagohRes3

The function complains that some points were omitted since they fall outside of the box, for now defined only by the zoom!

1) Let's plot the entire crime data set from 2008 to 2014 but only include THEFT, any kind of theft! Here we load the dataset, check the headings and chose only three of the columns, and rename them.


mydata3 = read.csv("~/Chicago_crime1/Crime_chicago_2008plus.csv")
head(mydata3)
(mydata3 <- mydata3[,c(2,6,7)]) head(mydata3)

We should see


offense lati long
1 THEFT 41.92707 -87.74980
2 BATTERY 41.90886 -87.73002
3 WEAPONS VIOLATION 41.96282 -87.71545
4 PUBLIC PEACE VIOLATION 41.92373 -87.75628
5 NARCOTICS 41.90470 -87.71159
6 ASSAULT 41.90202 -87.76640

If we plot all of the crime in the map things are a bit busy and takes a while...

all_crime_hres


ggmap(map_of_Chicago) + geom_point(data = mydata3, aes(x = long, y = lati, fill = "red", alpha = 0.8), size = 1, shape = 21) + guides(fill=FALSE, alpha=FALSE, size=FALSE)

Using the function nrow() we see our dataset has 2107991 entries of which,
we selected 458058 that are classified as THEFT. Furthermore, we will only select STREET theft for the exampled below.

theft_little_diff

Let's only plot the first 10000. To do this, write:


(mydata3_THEFT_less <- mydata3_THEFT[1:10000, ]) mydata3_THEFT_less["13651",] mydata3_THEFT["13652",]

offense lati long
13651 THEFT NA NA
> mydata3_THEFT_less["13652",]
offense lati long
13652 THEFT 41.8301 -87.61853
> mydata3_THEFT["13652",]
offense lati long
13652 THEFT 41.8301 -87.61853

What is going on? As you can see sometimes there are empty entries in the data, denoted NA.
It means we are not plotting 10000 points! The function complete.cases() returns a logical vector indicating which cases are complete. We can look this up and clean the data before selecting 10000 sample points.


newdata <- na.omit(mydata3) (mydata3_THEFT <- newdata[newdata$offense == "THEFT", ]) nrow(mydata3_THEFT) mydata3_THEFT[!complete.cases(mydata3_THEFT),] (mydata3_THEFT_less <- mydata3_THEFT[1:10000, ]) ggmap(map_of_Chicago) + geom_point(data = mydata3_THEFT_less, aes(x = long, y = lati, fill = "red", alpha = 0.8), size = 1, shape = 21) + guides(fill=FALSE, alpha=FALSE, size=FALSE)

This will have a statistical effect to some extent on any conclusions drawn from this data! This dataset is certainly not pretty and not 'perfect'.

Now we see that downtown Chicago is a big theft cluster - not surprisingly.

We can save the work by writing:

dev.copy(png,'myplot.png', width=1200, height=670)
dev.off()

myplot

Now let's go to the southside area of Chicago around Hyde Park and UChicago campus, an center the map on lon = -87.60, lat = 41.80. As well as plotting the points, we can also draw a nice heat map!
We notice however that the map is distorted. One reason for this is that the dataset exists outsdie the box defined by the zoom level we chose. In the next few paragraphs I will show how the size and location of your data can influence (or make you not notice this problem in the first place), and how to solve the issue!


map_of_Chicago3 <- get_map(location = c(lon = -87.60, lat = 41.80), zoom = 15, maptype = "roadmap", scale = 2) ggmap(map_of_Chicago3) + geom_point(data = mydata3_THEFT_less, aes(x = long, y = lati, fill = "red", alpha = 0.8), size = 1, shape = 21) + guides(fill=FALSE, alpha=FALSE, size=FALSE) ggmap(map_of_Chicago3) + stat_density2d(aes(x = long, y = lati, fill = ..level.. , alpha = ..level..),size = 2, bins = 15, data = mydata3_THEFT_less, geom = 'polygon') + scale_fill_gradient('Theft\nDensity') + scale_alpha(range = c(.4, .75), guide = FALSE) + guides(fill = guide_colorbar(barwidth = 1.5, barheight = 10)) ggmap(map_of_Chicago3) + stat_density2d(aes(x = long, y = lati, fill = ..level.., alpha = ..level..),size = 2, bins = 8, data = mydata3_THEFT_less, geom = 'polygon') + scale_fill_gradient('Theft\nDensity') + scale_alpha(range = c(.4, .75), guide = FALSE) + geom_point(data = mydata3_THEFT_less, aes(x = long, y = lati), size = 2, fill = "red", shape = 21) + guides(fill = guide_colorbar(barwidth = 1.5, barheight = 10))

If we zoom out a bit we have an issue...

zoomed_weird_stat2d

If we zoom out far enough, past the data things are starting to look good!

zoomed_stat2d

In progress

The solution is something shown below, although I will update this with more detail shortly.

Finally we have:

southside_hear

Next, we can take specific dates and show the evolution of crime pattern year by year! Or maybe even catch the seasonal trends in Hyde Park, when i) new students arrive in the Fall, and ii) when spring comes in March-ish. This is Chicago after all!