• Assigned: Friday, January 22, 2016
  • Due: By beginning of class Tuesday, February 2, 2016
  • Submit via GitHub


In this assignment we will further explore the Chicago crime data. This time we will look at a larger portion of the data and augment our analysis by including data about socioeconomics, population, and police stations.

To get credit for your answer, you must show in full the code that produced the answer.

Question 1: Crime counts and socioeconomics

Download the crime data for all of the year 2015. Also download the socioeconomic data.

(a) Calculate the number of crimes in each Community Area in 2015.

(b) Sort the Community Areas by 2015 crime count. Which Community Area (by name) has the highest crime count. The lowest?

(c) Create a table whose rows are days in the year and columns are the 77 Community Area crime counts. Select a few Communities that you are interested and plot time series.

(d) By joining with the socioeconomic data, create a scatter plot of crime counts against per capita income. Summarize the relationship in words.

Question 2: Community Area populations

Download the census block population data and the Community Area tracts mapping.

(a) Join these together using the fact that the last six digits of the tract id in the mapping data correspond to the first six digits of the block id. However, the data portal has a bug: if the block starts with a zero, that digit is missing!

(b) Calculate the total population in each Community Area.

Question 3: Crime rates

Using your answer to (2), calculate the crime rate (defined as crime count per thousand capita) for the city in 2015. Then reanswer (1a-d) with crime count replaced by crime rate. Summarize your findings in words.

Question 4: Crime and Police Stations

Download the police stations data.

(a) Extract the latitudes and longitudes of the police stations (found in the LOCATION column) as floats into their own columns called 'Station Latitude' and 'Station Longitude', respectively.

(b) Join the crime data with the stations on police district. Hint: the station district is a text field (because one of them is 'Headquarters') so you'll need to convert the crime district to the same.

(c) Define a function which calculates the distance in kilometers between two points (latitude, longitude) using the Pythagorean theorem.

Hint: To convert the coordinate distance to kilometers multiply by 95. For example the distance from (41, -87) to (41.1,-87) is about 9.5km. This is the scale factor for these coordinates near Chicago. Note this method is approximate because the scale factor varies from point to point (i.e. the Earth is not flat!).

(d) Calculate the distance between each crime and its district police station. Hint: If your answer to (c) is of the form

def distance(row):
crime_lat = row['Latitude']
crime_lon = row['Longitude']
station_lat = row['Station Latitude']
station_lon = row['Station Longitude']

Then you can simply do
df.apply(distance, axis=1)

(e) Plot a histogram of crime distances to district police stations. Summarize the relationship in words.