• Assigned: Thursday, January 7, 2016
  • Due: By 5PM on Friday, January 15, 2016
  • Submit by email to: computationforpolicy@lists.uchicago.edu


In this assignment you will practice basic data manipulation and plotting using Python. We will use City of Chicago crime statistics from the open data portal as our data source. First, go to the data portal and view the crime statistics data available:


Since there’s quite a lot of data, for this assignment let’s filter down the data a bit so we have a more manageable dataset.

Go to Filter and apply a Filter on the Date column to select crimes after 12/01/2015 12:00:00 AM. This should produce a dataset with ~18,000 crimes (approximate as the data may be updated as you do the assignment). Now, select Export, and download the data in ‘csv’ format, a text format where each field is separated by a comma.

Your solution will be a Python script that does the following basic computations. Use whichever modules you think are appropriate. Denote the solution to each part with a comment, e.g.:

# Part I: Question 1
code goes here

To get credit for your answer, you must show in full the code that produced the answer.

Part I: Data Manipulation

Question 1: Write a piece of code that reads in the data to a format that you can use.

Let’s now look at some descriptive statistics.

Question 2: How often did a crime result in an arrest?

Question 3: Which types of crimes most often result in arrest?

Question 4: What are the number of weapons violations (one of the Primary Types) per district?

Question 5: What are the number of arrests per days of the week? Which day of the week has the most arrests?

Part 2: Basic Plotting

In this part we’ll make a few plots to visualize the data.

Question 6: Make bar charts that show (a) the result of Question 4 and (b) Question 5.

Question 7: Make a scatter plot of latitude versus longitude (we’ll get more into making real maps later in the course) for those crimes where the Primary Type was: deceptive pratice.

Question 8: Make a histogram that shows the number of arrests per beat. The Y-axis should show Frequency and the X-axis should show the count of arrests in that beat.

Challenge 1 (ungraded): Make a timeseries plot of arrests per date in that month